jaunt中的ResponseException

时间:2016-10-19 10:03:55

标签: exception web-scraping jaunt-api

这是错误 消息:UserAgent.sendGET;回复错误

requestUrl:https://www.linkedin.com/directory/topics-c/

响应:   requestURL:https://www.linkedin.com/directory/topics-c/

状态:999

这是我的代码

尝试{             文档doc = userAgent.visit(link);

        Elements eles = doc.findEvery("<ul class=\"column quad-column\">");
        for (int i = 0; i < eles.size(); i++) {
            Elements href_keywords = eles.getElement(i).findEvery("<a href>");
            for (int j = 0; j < href_keywords.size(); j++) {
                keywords.add(href_keywords.getElement(j).getText());
            }
        }

1 个答案:

答案 0 :(得分:0)

you should find the elements like this:

Elements eles = userAgent.doc.findEvery("");

here is the full code:

package scrap;

import com.jaunt.*;

public class Scrap {

    public static void main(String[] args) {
        try {
            UserAgent userAgent = new UserAgent();
            userAgent.visit("https://www.linkedin.com/directory/topics-c/");
       //     System.out.println(userAgent.doc.innerHTML());
            Elements eles = userAgent.doc.findEvery("<ul class=\"column quad-column\">");
            for (int i = 0; i < eles.size(); i++) {
                Elements href_keywords = eles.getElement(i).findEvery("<a href>");
                for (int j = 0; j < href_keywords.size(); j++) {

                    /// here add to your LIST
                    System.out.println(href_keywords.getElement(j).getText()); 
                }
            }
        } catch (JauntException e) {
            System.err.println(e);
        }
    }
}