Question

我正在使用Jsoup从网站上通过邮政编码提取数据。邮政编码从文本文件中读取结果写在控制台上。我有大约1500个邮政编码。该程序抛出两种例外：

org.jsoup.HttpStatusException: HTTP error fetching URL. Status=500, URL=http://www.moving.com/real-estate/city-profile/...

java.net.SocketTimeoutException: Read timed out

我认为解决方案是当时只读取少量数据。所以，我使用了一个计数器，从文本文件中计算了200个邮政编码，在我获得200个邮政编码的数据后，我停止了5分钟。正如我所说，我仍然有例外。到目前为止，当我看到异常时，我会复制粘贴可用数据，然后我继续使用以下邮政编码。但我想在不中断的情况下读取所有数据。这可能吗？任何提示将不胜感激。提前谢谢！

这是我阅读所有数据的代码：

    while (br.ready())
        {
            count++;

            String s = br.readLine();
            String str="http://www.moving.com/real-estate/city-profile/results.asp?Zip="+s; 
            Document doc = Jsoup.connect(str).get();

            for (Element table : doc.select("table.DataTbl"))
            {
                for (Element row : table.select("tr")) 
                {
                    Elements tds = row.select("td");
                    if (tds.size() > 1)
                    {
                        if (tds.get(0).text().contains("Per capita income"))
                            System.out.println(s+","+tds.get(2).text());
                    }
                }
            }
            if(count%200==0)
            {
                Thread.sleep(300000);
                System.out.println("Stoped for 5 minutes");
            }
        }

Answer 1

更新此行Document doc = Jsoup.connect(str).get();以将超时设置为：

        Connection conn = Jsoup.connect(str);
        conn.timeout(300000); //5 minutes
        Document doc = conn.get();

Answer 2

连接conn = Jsoup.connect（str）; conn.timeout（0）; /无限超时

设置请求超时（连接和读取）。如果发生超时，则   将抛出IOException。默认超时为3秒（3000   米利斯）。 超时为零被视为无限超时。

<强>参数：
millis - number of milliseconds before timing out connects or reads.
<强>返回：
this Connection, for chaining

Source: jsoup API

将超时设置为零。这样你就不得不停下来5分钟。

我从网站提取数据时的例外情况

2 个答案: