在一个特定页面上给出“java.net.SocketTimeoutException:read timed out”

时间:2016-04-26 10:46:38

标签: java web-scraping jsoup

我创建了一个网页剪贴簿,它从页面中删除数据并将其存储在.csv文件中。我用多个页面执行这个程序但是,当我用该链接执行我的程序时有一个页面,它在行上给出了“java.net.SocketTimeoutException:read timed out”的错误我创建了jsoup库的连接。我不明白为什么它会在该特定页面上给出错误。我的代码和日志如下所述 注意:我使用的是jsoup HTML解析器,java 1.7,Netbeans。

public class ComOpen_end_fund {

    boolean writeCSVToConsole = true;
    boolean writeCSVToFile = true;
    boolean sortTheList = true;
    boolean writeToConsole;
    boolean writeToFile;
    public static Document doc = null;
    public static Elements tbodyElements = null;
    public static Elements elements = null;
    public static Elements tdElements = null;
    public static Elements trElement2 = null;
    public static String Dcomma = ",";
    public static String line = "";
    public static ArrayList<Elements> sampleList = new ArrayList<Elements>();

    public static void createConnection() throws IOException {
        System.setProperty("http.proxyHost", "191.1.1.202");
        System.setProperty("http.proxyPort", "8080");
        String tempUrl = "http://mufap.com.pk/nav-report.php?tab=01&fname=&amc=&cat=&strdate=&endate=&submitted=&mnt=&yrs=&s=";
        doc = Jsoup.connect(tempUrl).get(); //this is line number 42
    }

    public static void parsingHTML() throws Exception {
        for (Element table : doc.getElementsByTag("table")) {

            for (Element trElement : table.getElementsByTag("tr")) {
                trElement2 = trElement.getElementsByTag("tr");
                tdElements = trElement.getElementsByTag("td");
                File fold = new File("C:\\open-end-fund.csv");
                fold.delete();
                File fnew = new File("C:\\open-end-fund.csv");
                FileWriter sb = new FileWriter(fnew, true);
                if (trElement.hasClass("tab-data")) {
                    for (Iterator<Element> it = tdElements.iterator(); it.hasNext();) {
                        if (it.hasNext()) {
                            sb.append("\r\n");

                        }

                        for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
                            Element tdElement2 = it.next();
                            final String content = tdElement2.text();
                            if (it2.hasNext()) {

                                sb.append(formatData(content));
                                sb.append("   ,   ");

                            }
                        }

                        System.out.println(sb.toString());
                        sb.flush();
                        sb.close();
                    }
                }
                System.out.println(sampleList.add(tdElements));

            }
        }
    }
    private static final SimpleDateFormat FORMATTER_MMM_d_yyyy = new SimpleDateFormat("MMM d, yyyy", Locale.US);
    private static final SimpleDateFormat FORMATTER_dd_MMM_yyyy = new SimpleDateFormat("dd-MMM-YYYY", Locale.US);

    public static String formatData(String text) {
        String tmp = null;

        try {
            Date d = FORMATTER_MMM_d_yyyy.parse(text);
            tmp = FORMATTER_dd_MMM_yyyy.format(d);
        } catch (ParseException pe) {
            tmp = text;
        }

        return tmp;
    }

    public static void main(String[] args) throws IOException, Exception {
        createConnection(); //this is line number 100
        parsingHTML();

    }

}

这是log cat

Exception in thread "main" java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
    at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:516)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:194)
    at com.open_end_fund.ComOpen_end_fund.createConnection(ComOpen_end_fund.java:42)
    at com.open_end_fund.ComOpen_end_fund.main(ComOpen_end_fund.java:100)
C:\Users\talha\AppData\Local\NetBeans\Cache\8.1\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 3 seconds)

当我在http://www.mufap.com.pk/nav_returns_performance.php?tab=01上运行此代码时 这个链接工作正常。

1 个答案:

答案 0 :(得分:3)

您可以尝试增加超时时间:

Jsoup.connect(url).timeout(30000).get();

这会将超时设置为30秒。默认值为3秒。如果将其设置为0,它将表现为无限超时。

https://jsoup.org/apidocs/org/jsoup/Connection.html#timeout-int-