使用java获取谷歌搜索结果

时间:2016-04-23 10:54:43

标签: java search jsoup

我已将jsoup用于java,但它会检索非常无关的链接。下面是我到目前为止使用的代码。

private static Pattern patternDomainName;
      private Matcher matcher;
      private static final String DOMAIN_NAME_PATTERN 
        = "([a-zA-Z0-9]([a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])?\\.)+[a-zA-Z]{2,6}";
      static {
        patternDomainName = Pattern.compile(DOMAIN_NAME_PATTERN);
      } 
  public static void main(String[] args) {

    Scanner input = new Scanner(System.in);  
    FunnyCrawler obj = new FunnyCrawler();
    String str = input.nextLine();
    Set<String> result = obj.getDataFromGoogle(str);
    for(String temp : result){
        System.out.println(temp);
    }
    System.out.println(result.size());
  }

  public String getDomainName(String url){

    String domainName = "";
    matcher = patternDomainName.matcher(url);
    if (matcher.find()) {
        domainName = matcher.group(0).toLowerCase().trim();
    }
    return domainName;

  }

  private Set<String> getDataFromGoogle(String query) {

    Set<String> result = new HashSet<String>(); 
    String request = "https://www.google.com/search?q=" + query + "&num=20";
    System.out.println("Sending request..." + request);

    try {

        // need http protocol, set this as a Google bot agent :)
        Document doc = Jsoup
            .connect(request)
            .userAgent(
              "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)")
            .timeout(5000).get();

        // get all links
        Elements links = doc.select("a[href]");
        for (Element link : links) {

            String temp = link.attr("href");        
            if(temp.startsWith("/url?q=")){
                                //use regex to get domain name
                result.add(getDomainName(temp));
            }

        }

    } catch (IOException e) {
        e.printStackTrace();
    }

    return result;
  }

那么,在java中是否还有其他方法来获取我通常使用google.com获得的相应谷歌搜索结果?我正在为jsp servlet工作,所以我需要在servlet上运行该代码....

0 个答案:

没有答案