Question

我正在尝试使用JSoup，我无法通过扫描仪进行第二次复飞。它会直接跳到我的catch语句中。

以下是该计划的说明：

我将谷歌搜索词作为用户输入（字符串）。接下来，我询问用户希望查看的查询项的数量，并输入一个整数。

我循环遍历返回的每个元素并将其添加到ArrayList。控制台上显示的字符串由索引，链接文本和超链接组成。

然后我想询问用户他们想要输入哪个索引来打开导致该链接的浏览器窗口。这是通过使用Linux终端命令＆＃34; xdg-open＆＃34; cocantenating hRef字符串来完成的。使用Runtime类。

直到有时间询问选择哪个索引时，它才能正常工作。

这是我的代码：

/**
 * Created by christopher on 4/26/16.
 */

import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class GoogleSearchJava {

    static int index;
    static String linkHref;
    static Scanner input;

    public static final String GOOGLE_SEARCH_URL = "https://www.google.com/search";

    public static void main(String[] args) throws IOException {

        //GET INPUT FOR SEARCH TERM

        input = new Scanner(System.in);
        System.out.print("Search: ");
        String searchTerm = input.nextLine();
        System.out.print("Enter number of query results: ");
        int num = input.nextInt();

        String searchURL = GOOGLE_SEARCH_URL + "?q=" + searchTerm + "&num=" + num;

        //NEED TO DEFINE USER AGENT TO PREVENT 403 ERROR.
        Document document = Jsoup.connect(searchURL).userAgent("Mozilla/5.0").get();

        //OPTION TO DISPLAY HTML FILE IN BROWSWER. DON'T KNOW YET.
        //System.out.println(doc.html());

        //If google search results HTML change the <h3 class="r" to <h3 class ="r1"
        //need to change below stuff accordingly
        Elements results = document.select("h3.r > a");

        index = 0;
        String news = "News";
        ArrayList<String> displayResults = new ArrayList<>();
        for (Element result : results) {
            index++;
            linkHref = result.attr("href");
            String linkText = result.text();
            String pingResult = index + ": " + linkText + ", URL:: " + linkHref.substring(6, linkHref.indexOf("&")) + "\n";

            if (pingResult.contains(news)) {
                System.out.println("FOUND " + "\"" + linkText + "\"" + "NO HYPERTEXT FOR NEWS QUERY RESULTS AT THIS TIME. SKIPPED INDEX.");
                System.out.println();
            } else {
                displayResults.add(pingResult);
            }
        }
        for(String urlString : displayResults) {
            System.out.println(urlString);
        }
        System.out.println();

        goToURL(input, displayResults);
    }
    public static int goToURL(Scanner input, ArrayList<String> resultList) {

        int newIndex = 0;

        try {

            System.out.print("Enter Index (i.e. 1, 2, etc) you wish to visit, 0 to exit: ");

            newIndex = input.nextInt();
            input.nextLine();

            for (String string : resultList) {

                if(string.startsWith(String.valueOf(newIndex))) {

                    Process process = Runtime.getRuntime().exec("xdg-open " + string.substring(6, string.indexOf("&")));
                    process.waitFor();
                }
            }
        } catch (Exception e) {
            System.out.println("ERROR while parsing URL");
        }
        return newIndex;
    }
}

这里是输出注意我输入＆＃34; 1＆＃34;不，我没有注意按下＆＃34; 0＆＃34;然而：

Search: Oracle
Enter number of query results: 3
1: Oracle | Integrated Cloud Applications and Platform Services, URL:: =http://www.oracle.com/

2: Oracle Corporation - Wikipedia, the free encyclopedia, URL:: =https://en.wikipedia.org/wiki/Oracle_Corporation

3: Oracle on the Forbes America's Best Employers List, URL:: =http://www.forbes.com/companies/oracle/


Enter Index (i.e. 1, 2, etc) you wish to visit, 0 to exit: 1
ERROR while parsing URL

Process finished with exit code 0

Answer 1

ERROR while parsing URL表示错误来自

try {

    System.out.print("Enter Index (i.e. 1, 2, etc) you wish to visit, 0 to exit: ");

    newIndex = input.nextInt();
    input.nextLine();

    for (String string : resultList) {

        if(string.startsWith(String.valueOf(newIndex))) {

            Process process = Runtime.getRuntime().exec("xdg-open " + string.substring(6, string.indexOf("&")));
            process.waitFor();
        }
    }
} catch (Exception e) {
    System.out.println("ERROR while parsing URL");
}

我没有在Linux上工作，所以我无法对其进行测试，但我怀疑你的网址是以=开头的（你会注意到你的控制台包含URL:: =...您的打印声明没有=，因此它是您尝试访问的地址的一部分）。

所以将.substring(6, hRef.indexOf("&")) 6更改为7。

其他问题是hRef设置为linkHref，这将是您选择的谷歌的最后结果。您应该创建自己的类，它将存储正确的href及其描述，或者传递代表您选择的Element元素的<a ...>..</a>列表（您也不需要检查基于列表的元素）在1: ...格式上，如果要将1映射到索引0，将2映射到索引1等，只需使用list.get(index - 1)。

现在的最后建议是，您可以使用此处描述的解决方案How to open the default webbrowser using java更改您的代码，使其更加独立于操作系统，而不是尝试执行xdg-open

扫描程序可以在第二次输入之前关闭Java程序

1 个答案: