Jsoup:无法检索时间/日期元素

时间:2014-10-24 21:14:31

标签: java css parsing jsoup

我正在尝试从以下链接中提取一些数据,我可以检索该页面的任何元素的内容,但保留动态时间的内容除外。我尝试了所有可能的CSS选择器,但没有一个工作。您可能想要检查元素of this page

HTML:

    <div class="meta">
        <span class="information-row properties"></span>
        <span class="information-row">
            <span class="date" data-time="1414176068">

                Today 12:41 am <!-- This is the text I want to extract -->

            </span>

            ,


            <span class="category"></span>

            ,


            <span class="location"></span>
        </span>
    </div>

Java代码:

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

    /**
     * @param args
     */
    public static void main(String[] args) throws IOException {

            Document doc=Jsoup.connect("http://bikroy.com/en/ads-in-bangladesh?query=Nokia&category=&location=").userAgent("Chrome").timeout(999999).get();

            Elements titles = doc.select("div.title");
            Elements prices = doc.select("span.data");
            Elements locations = doc.select("span.location");
            Elements dates = doc.select("span.date");
            //Elements dates = doc.select("[data-time*=14]");

            for(int i = 0; i < titles.size(); i++) {
                  System.out.println("\nTitle: " + titles.get(i).text() + "\nPrice: " +  prices.get(i).text() + "\nLocation: " +  locations.get(i).text() + "\nDate: " +  dates.get(i).text());
            }


   }
}

样本输出:

Title: Brand New Nokia Lumia 530 Dual Sim
Price: Tk. 8,000
Location: Dhaka
Date: 

Title: Nokia n95
Price: Tk. 999
Location: Dhaka
Date: 

Title: Nokia c3-01
Price: Tk. 3,500
Location: Dhaka
Date: 

请参阅?日期是空的!!我该如何解决这个问题?

更新

import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.TimeZone;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class Main {

    /**
     * @param args
     */
    public static void main(String[] args) throws IOException, ArrayIndexOutOfBoundsException {

            Document doc=Jsoup.connect("http://bikroy.com/en/ads-in-bangladesh?query=Nokia&category=&location=").userAgent("Chrome").timeout(999999).get();

            Elements titles = doc.select("div.title");
            Elements prices = doc.select("span.data");
            Elements locations = doc.select("span.location");
            Elements dates = doc.select("span.date");

            int j = 0;
            String[] d = new String[dates.size()];
            for(Element date:dates){
                d[j++] = date.attr("data-time");
            }

            for(int i = 0; i < titles.size(); i++) {

                long unixSeconds = Long.valueOf(d[i]).longValue();
                Date dt = new Date(unixSeconds*1000L);
                SimpleDateFormat sdf = new SimpleDateFormat("dd-MMM-yyyy hh:mm:ss a");
                sdf.setTimeZone(TimeZone.getTimeZone("GMT+6"));
                String fd = sdf.format(dt);

                 System.out.println("\nTitle: " + titles.get(i).text() + "\nPrice: " +  prices.get(i).text() + "\nLocation: " +  locations.get(i).text() + "\nPosted On: " + fd);
            }


   }
}

新输出:

Title: Nokia C5-00
Price: Tk. 2,850
Location: Dhaka Division
Posted On: 26-Oct-2014 11:46:39 PM

Title: Nokia lumia 1320
Price: Negotiable price
Location: Dhaka
Posted On: 26-Oct-2014 11:39:13 PM

Title: Nokia N73
Price: Tk. 1,000
Location: Dhaka
Posted On: 26-Oct-2014 11:37:14 PM

1 个答案:

答案 0 :(得分:1)

如果你查看页面的原始html,span class="date"没有内容,页面必须选择data-time属性并使用javascript进行转换。

我认为你必须这样做。读取数据时间并转换为日期。

提示:日期时间看起来是以秒为单位的时间戳。