我正在尝试从以下链接中提取一些数据,我可以检索该页面的任何元素的内容,但保留动态时间的内容除外。我尝试了所有可能的CSS选择器,但没有一个工作。您可能想要检查元素of this page。
<div class="meta">
<span class="information-row properties"></span>
<span class="information-row">
<span class="date" data-time="1414176068">
Today 12:41 am <!-- This is the text I want to extract -->
</span>
,
<span class="category"></span>
,
<span class="location"></span>
</span>
</div>
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
/**
* @param args
*/
public static void main(String[] args) throws IOException {
Document doc=Jsoup.connect("http://bikroy.com/en/ads-in-bangladesh?query=Nokia&category=&location=").userAgent("Chrome").timeout(999999).get();
Elements titles = doc.select("div.title");
Elements prices = doc.select("span.data");
Elements locations = doc.select("span.location");
Elements dates = doc.select("span.date");
//Elements dates = doc.select("[data-time*=14]");
for(int i = 0; i < titles.size(); i++) {
System.out.println("\nTitle: " + titles.get(i).text() + "\nPrice: " + prices.get(i).text() + "\nLocation: " + locations.get(i).text() + "\nDate: " + dates.get(i).text());
}
}
}
Title: Brand New Nokia Lumia 530 Dual Sim
Price: Tk. 8,000
Location: Dhaka
Date:
Title: Nokia n95
Price: Tk. 999
Location: Dhaka
Date:
Title: Nokia c3-01
Price: Tk. 3,500
Location: Dhaka
Date:
请参阅?日期是空的!!我该如何解决这个问题?
更新
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.TimeZone;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Main {
/**
* @param args
*/
public static void main(String[] args) throws IOException, ArrayIndexOutOfBoundsException {
Document doc=Jsoup.connect("http://bikroy.com/en/ads-in-bangladesh?query=Nokia&category=&location=").userAgent("Chrome").timeout(999999).get();
Elements titles = doc.select("div.title");
Elements prices = doc.select("span.data");
Elements locations = doc.select("span.location");
Elements dates = doc.select("span.date");
int j = 0;
String[] d = new String[dates.size()];
for(Element date:dates){
d[j++] = date.attr("data-time");
}
for(int i = 0; i < titles.size(); i++) {
long unixSeconds = Long.valueOf(d[i]).longValue();
Date dt = new Date(unixSeconds*1000L);
SimpleDateFormat sdf = new SimpleDateFormat("dd-MMM-yyyy hh:mm:ss a");
sdf.setTimeZone(TimeZone.getTimeZone("GMT+6"));
String fd = sdf.format(dt);
System.out.println("\nTitle: " + titles.get(i).text() + "\nPrice: " + prices.get(i).text() + "\nLocation: " + locations.get(i).text() + "\nPosted On: " + fd);
}
}
}
新输出:
Title: Nokia C5-00
Price: Tk. 2,850
Location: Dhaka Division
Posted On: 26-Oct-2014 11:46:39 PM
Title: Nokia lumia 1320
Price: Negotiable price
Location: Dhaka
Posted On: 26-Oct-2014 11:39:13 PM
Title: Nokia N73
Price: Tk. 1,000
Location: Dhaka
Posted On: 26-Oct-2014 11:37:14 PM
答案 0 :(得分:1)
如果你查看页面的原始html,span class="date"
没有内容,页面必须选择data-time
属性并使用javascript进行转换。
我认为你必须这样做。读取数据时间并转换为日期。
提示:日期时间看起来是以秒为单位的时间戳。