Question

嗨Java专家，我试图从给定的URL地址中提取数据，其中信息隐藏在＆＃34; div id＆＃34;之下。我的URL查询页面如下所示：

我将肽序列作为我的查询，然后点击＆＃34;搜索数据集＆＃34;按钮以表格形式查看结果。

但是，当我想要做的时候＆＃34;查看页面来源＆＃34;以HTML格式查看结果，我没有看到该表格。

使用＆＃39; firebug＆＃39;我可以在HTML中看到该表，看起来像这样：

[！[在此处输入图像说明] [2]] [2]

为了获取我的查询数据，我编写了简单的JAVA脚本：

package retrieve.information;
import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class DemoExtractHidenHtml {
    public static void main(String[] args) {
        Document document;
        try {
            document = Jsoup.connect("http://example.com/xyz_proxi.jsp#{\"searched_button\":\"datasets\",\"peptide\":\"NLAVSQVVHK\"}").userAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.21 (KHTML, like Gecko) Chrome/19.0.1042.0 Safari/535.21").get();
            Element dataset = document.select("td.table[datasets]_row[0]_column[1]").first();
            System.out.println(dataset);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}

当然它不适合我，我得到以下错误：

Exception in thread "main" org.jsoup.select.Selector$SelectorParseException: Could not parse query 'td.table[datasets]_row[0]_column[1]': unexpected token at '_row[0]_column[1]'
at org.jsoup.select.QueryParser.findElements(QueryParser.java:196)
at org.jsoup.select.QueryParser.parse(QueryParser.java:65)
at org.jsoup.select.QueryParser.parse(QueryParser.java:39)
at org.jsoup.select.Selector.<init>(Selector.java:84)
at org.jsoup.select.Selector.select(Selector.java:106)
at org.jsoup.nodes.Element.select(Element.java:286)
at retrieve.information.DemoExtractHidenHtml.main(DemoExtractHidenHtml.java:14)

任何人都知道如何克服这个问题，我是JAVA的新手。

Answer 1

如果您可以在Firebug中看到该表，则复制其选择器（CSS路径）并使用如下

document.select(selector_str);
document.select("#rso > div > div:nth-child(1) > div > h3 > a");

Answer 2

您好我用硒解决了这个问题。所以解决我的问题：

package extract.data;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;

public class ExtractDataDynamic {
private static Scanner kb;

public static void main(String[] args) {
    // TODO Auto-generated method stub
    kb = new Scanner(System.in);
    String userpepseq;
    userpepseq = kb.nextLine();
    if (userpepseq.trim().isEmpty()){
        System.out.println("User didn't input any value!");
    } else {
        if (Pattern.matches("[a-zA-Z]+", userpepseq) == true) {
            WebDriver drivermassid = new FirefoxDriver();
            drivermassid.manage().window().maximize();
            drivermassid.get("http://exmaple.com/xyz_proxi.jsp#{\"searched_button\":\"datasets\",\"peptide\":\""+userpepseq+"\"}");
            //Here we are storing the value from the cell in to the string variable
            String sCellValuemassid = drivermassid.findElement(By.xpath(".//*[@class='result']/tbody")).getText();
            drivermassid.quit();
            if (sCellValuemassid.length() > 0){
                String mid="";
                String status="";
                Pattern pattern = Pattern.compile("MSV\\d+\\s+\\d+\\s+");
                Matcher macther= pattern.matcher(sCellValuemassid);
                while (macther.find()){
                    mid=((macther.group()).split("\\ "))[0];
                    status=((macther.group()).split("\\ "))[1];
                }
                if (meid.length() > 0 ){
                    WebDriver drivermasspro = new FirefoxDriver();
                    drivermasspro.manage().window().maximize();
                    drivermasspro.get("http://exmaple.com/xyz_proxi.jsp#{\"searched_button\":\"proteins\",\"peptide\":\""+userpepseq+"\"}");
                    String sCellValuemasspro = drivermasspro.findElement(By.xpath(".//*[@class='result']/tbody")).getText();
                    drivermasspro.quit();
                    if (sCellValuemasspro.length() > 0){
                        String [] proteinifo = sCellValuemasspro.split("\\n");
                        for (int i=0;i<proteinifo.length;i++) {
                            String [] subproteinifo = proteinifo[i].split("\\ ");
                            System.out.println(mid+" "+status+" "+subproteinifo[1]);
                        }
                    }
                } else {
                    System.out.println(" ID doesn't exist for "+userpepseq +".");
                }
            } else {
                System.out.println(userpepseq+" doesn't exist in database.");
            }


        } else {
            System.out.println(userpepseq+" should not contain any number!");
        }
    }

因为该表是动态的，并且他们使用javascript将数据填充到表中，所以我发现这是解决我的问题的方法之一。感谢

提取隐藏在DIV id下的值

2 个答案: