Question

我是xpath的新手，我正试图在scrapy中使用xpath表达式抓取一个网站。我试图抓取的页面结构是 -

...
<div class="article-body">
<p class="body">Text1</p>
<p class="body">Text2</p>
<p class="body">Text3</p>
...

我正在尝试的xpath是 -

//div[@class="article-body"]/p/text()

但我得到的只是Text1在我的数据库中。而不是这个，我希望输出为 -

Text1.Text2.Text3

我想我应该使用concat或string-join或类似的功能。但我无法解决这个问题。由于我必须将此xpath表达式作为scrapy中的参数传递，因此我需要将其作为单个表达式。

我尝试将concat函数提供给我的django-scraper as -

concat(//div[@class="article-body"]/p)

但它把这个错误扔给了我 -

File "C:\Anaconda2\lib\site-packages\scrapy\selector\unified.py", line 100, in xpath raise ValueError(msg if six.PY3 else msg.encode("unicode_escape"))

我尝试时遇到同样的错误（页面上没有其他<p>元素） -

concat(//p)

或

string-join(//p)

但是，当我尝试时，string(//p)我的数据库中有Text1。

Answer 1

你试试这个： -

concat(//div[@class="article-body"]/p)

String values = myTestDriver.findElement(By.xpath("concat(//div[@class="article-body"]/p)"));

OR

你需要做这样的事情

    ArrayList<String> name;
    String name1;
    List<WebElement> options = myTestDriver.findElements(By.xpath("//div[@class="article-body"]/p"));
    System.out.println(options.size());
    for(int i=0 ; i<options.size() ; i++){
        System.out.println(options.get(i).getText());
        name1 = options.get(i).getText();
        name.add(name1);
    }

现在你可以进行连接

如何使用xpath表达式选择多个文本节点作为单个字符串？

1 个答案: