线程“main”中的异常java.net.SocketTimeoutException:读取jsoup中的超时时间

时间:2016-09-09 15:46:07

标签: java web-scraping jsoup

我可以手动到达this link并使用firefox浏览器正常搜索,但我无法使用JSOUP进行连接。

代码:

String url = "https://www.sosnc.gov/trademrk/search.aspx";
Connection.Response response = Jsoup.connect(url).timeout(45000)
            .method(Connection.Method.GET)
            .ignoreContentType(true)
            .userAgent("Mozilla/5.0 (Windows NT 6.3; rv:40.0) Gecko/20100101 Firefox/40.0")  
            .followRedirects(true)
            .execute();
Map<String, String> loginCookies = response.cookies();
Document document = response.parse(); //search results
Element __VIEWSTATE = document.select("input[name=__VIEWSTATE]").first();
Element __VIEWSTATEGENERATOR = document.select("input[name=__VIEWSTATEGENERATOR]").first();
Element __PREVIOUSPAGE = document.select("input[name=__PREVIOUSPAGE]").first();
Element __EVENTVALIDATION = document.select("input[name=__EVENTVALIDATION]").first();
response = Jsoup.connect(url).timeout(45000)
            .data("SosMenu_SiteTreeView_ExpandState", "ennnnnnnnnnnn")
            .data("SosMenu_SiteTreeView_PopulateLog", "")
            .data("SosMenu_SiteTreeView_SelectedNode", "SosMenu_SiteTreeViewn2")
            .data("ToolsTreeView_ExpandState", "ennn")
            .data("ToolsTreeView_PopulateLog", "")
            .data("ToolsTreeView_SelectedNode", "")
            .data("__EVENTARGUMENT", "")
            .data("__EVENTTARGET", "")
            .data("__EVENTVALIDATION", __EVENTVALIDATION.val())
            .data("__PREVIOUSPAGE", __PREVIOUSPAGE.val())
            .data("__VIEWSTATE", __VIEWSTATE.val())
            .data("__VIEWSTATEGENERATOR", __VIEWSTATEGENERATOR.val())
            .data("ctl00$ctl00$SosContent$SosContent$Submit1", "Search")
            .data("ctl00$ctl00$SosContent$SosContent$Type", "Goods")
            .data("ctl00$ctl00$SosContent$SosContent$txtSearc", query)
            .cookies(loginCookies)
            .method(Connection.Method.POST)
            .ignoreContentType(true)
            .userAgent("Mozilla/5.0 (Windows NT 6.3; rv:40.0) Gecko/20100101 Firefox/40.0")  
            .header("host", "www.sosnc.gov")
            .referrer("https://www.sosnc.gov/trademrk/search.aspx")  
            .followRedirects(true)
            .execute();
    document = response.parse(); //search results
    System.out.println(document);

我错过了什么吗?这是对服务器的Jsoup发布请求,因此我还添加了Cookie和所需的参数,但仍然无法获得结果。

1 个答案:

答案 0 :(得分:1)

我不知道你为什么会超时,但你可以轻松地以更简单的方式获取数据 -

String query = "abc";
String url = "https://www.sosnc.gov/trademrk/results.aspx?searchstr=" +
              query +
             "&Type=GOODS";
Connection.Response response = Jsoup.connect(url).timeout(45000)
            .method(Connection.Method.GET)
            .ignoreContentType(true)
            .userAgent("Mozilla/5.0 (Windows NT 6.3; rv:40.0) Gecko/20100101 Firefox/40.0")

            .followRedirects(true)
            .execute();
System.out.println(response.body());

不需要cookie或额外参数。