我想使用Java代码获取某些Google搜索引擎查询(在整个网络上)的估算结果计数。
我每天只需要进行非常少量的查询,因此,首先Google Web Search API虽然已弃用,但看起来还不错(例如How can you search Google Programmatically Java API)。但事实证明,此API返回的数字与www.google.com返回的数字非常不同(请参阅例如http://code.google.com/p/google-ajax-apis/issues/detail?id=32)。所以这些数字对我来说毫无用处。
我也尝试了Google Custom Search engine,它表现出同样的问题。
您认为对我的任务最简单的解决方案是什么?
答案 0 :(得分:4)
/**** @author RAJESH Kharche */
//open Netbeans
//Choose Java->prject
//name it GoogleSearchAPP
package googlesearchapp;
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.logging.Level;
import java.util.logging.Logger;
public class GoogleSearchAPP {
public static void main(String[] args) {
try {
// TODO code application logic here
final int Result;
Scanner s1=new Scanner(System.in);
String Str;
System.out.println("Enter Query to search: ");//get the query to search
Str=s1.next();
Result=getResultsCount(Str);
System.out.println("Results:"+ Result);
} catch (IOException ex) {
Logger.getLogger(GoogleSearchAPP.class.getName()).log(Level.SEVERE, null, ex);
}
}
private static int getResultsCount(final String query) throws IOException {
final URL url;
url = new URL("https://www.google.com/search?q=" + URLEncoder.encode(query, "UTF-8"));
final URLConnection connection = url.openConnection();
connection.setConnectTimeout(60000);
connection.setReadTimeout(60000);
connection.addRequestProperty("User-Agent", "Google Chrome/36");//put the browser name/version
final Scanner reader = new Scanner(connection.getInputStream(), "UTF-8"); //scanning a buffer from object returned by http request
while(reader.hasNextLine()){ //for each line in buffer
final String line = reader.nextLine();
if(!line.contains("\"resultStats\">"))//line by line scanning for "resultstats" field because we want to extract number after it
continue;
try{
return Integer.parseInt(line.split("\"resultStats\">")[1].split("<")[0].replaceAll("[^\\d]", ""));//finally extract the number convert from string to integer
}finally{
reader.close();
}
}
reader.close();
return 0;
}
}
答案 1 :(得分:1)
您可以做的就是以编程方式开始执行实际的Google搜索。最简单的方法是访问网址https://www.google.com/search?q=QUERY_HERE,然后您想要从该网页上删除结果计数。
以下是如何执行此操作的快速示例:
private static int getResultsCount(final String query) throws IOException {
final URL url = new URL("https://www.google.com/search?q=" + URLEncoder.encode(query, "UTF-8"));
final URLConnection connection = url.openConnection();
connection.setConnectTimeout(60000);
connection.setReadTimeout(60000);
connection.addRequestProperty("User-Agent", "Mozilla/5.0");
final Scanner reader = new Scanner(connection.getInputStream(), "UTF-8");
while(reader.hasNextLine()){
final String line = reader.nextLine();
if(!line.contains("<div id=\"resultStats\">"))
continue;
try{
return Integer.parseInt(line.split("<div id=\"resultStats\">")[1].split("<")[0].replaceAll("[^\\d]", ""));
}finally{
reader.close();
}
}
reader.close();
return 0;
}
对于使用方法,您可以执行以下操作:
final int count = getResultsCount("horses");
System.out.println("Estimated number of results for horses: " + count);