我不能用中文字符关键字刮取谷歌搜索结果

时间:2017-04-30 18:18:17

标签: java web-scraping jsoup urlencode chinese-locale

我无法在此处执行“中文关键字”搜索。 (恩可言)

String search = "大學";

英文关键字在这里很好(能够搜索)

我尝试将UTF-8big5用于charset

但他们两个都没有工作。

这是我的工作。

 public static void main(String[] args) throws UnsupportedEncodingException, IOException {

          String[] line = new String[100];
      final int[] score = {    0};


        String google = "http://www.google.com/search?q=";

        String search = "大學";

        String charset = "UTF-8";//UTF-8 is neither working 

        String news="&tbm=nws";

  String string = google + URLEncoder.encode(search , charset) + news+"&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2016%2Ccd_max%3A12%2F31%2F2016";
     String userAgent ="Chrome/57.0.2987.133"; 
     int numberOfResultpages = 10; // grabs first two pages of search results
    int idx = 0;
for (int i = 0; i < numberOfResultpages; i++) {

       Document document = Jsoup.connect(string).userAgent(userAgent) .data("start",""+i).get();
    Elements links = document.select( ".r>a");

        for (Element link : links) {

            String title = link.text();
            String url = link.absUrl("href"); // Google returns URLs in format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>".
            url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");

            if (!url.startsWith("http")) {
                continue; // Ads/news/etc.
            }
            System.out.println("Title: " + title);
            System.out.println("URL: " + url);

            line[idx++]=title;
       // }

}
     }

0 个答案:

没有答案