使用java从谷歌检索50个结果

时间:2011-11-26 08:36:04

标签: java

我在从谷歌检索50个结果时遇到困难。

我正在使用它:

public static List Retrieve(String entry){
    List entryList = new ArrayList();

      try {  
             // string new entry is making the keywords into query that can be known by google
             String newEntry = (java.net.URLEncoder.encode(entry, "UTF-8").replace("+", "%20"));

             // inputing the keywords to google search engine
             URL url = new URL("http://www.google.co.id/search?q=" + newEntry + "&hl=en&num=10&lr=&ft=i&cr=&safe=images&tbs=");  
             // makking connection to the internet
             URLConnection urlConn = url.openConnection();  
             urlConn.setUseCaches(false);  
             urlConn.setRequestProperty("User-Agent", "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729)");  

             // getting the input stream of page html into bufferedreader
             BufferedReader buffReader = new BufferedReader(new InputStreamReader(urlConn.getInputStream()));  
             String line;  
             StringBuffer buffer = new StringBuffer();  

       // getting the input stream of html into stringbuffer       
       while ((line = buffReader.readLine()) != null) {  
        buffer.append(line);  
       }  

       // finding the links
        Pattern p = Pattern.compile(GOOGLE);
        Matcher m = p.matcher(buffer.toString().toLowerCase());
        while (m.find()) {
            String link = m.group(0);
            // putting the links of google search into list
            entryList.add(link);
        }
           } catch (Exception e) {  
            System.out.println(e.getMessage());  
           } 

    return entryList;
}

但它只显示十个结果。

3 个答案:

答案 0 :(得分:2)

循环5次,每次向URL中的'start'GET变量添加10

for(i=0;i<5;i++)
{
    ...
    URL url = new URL("http://www.google.co.id/search?q=" + newEntry + "&hl=en&num=10&lr=&ft=i&cr=&safe=images&tbs=&start="+(i*10));
    ...
}

答案 1 :(得分:0)

请尝试转到Google搜索结果中的下一页我认为每个页面都会为您提供10个结果

答案 2 :(得分:0)

您所要做的就是将网址中的num GET参数从10更改为50.此参数指示您希望显示的结果数。