Question

我在弄清楚如何指定页面上的最后60个元素时遇到一些麻烦

posts = driver.find_elements_by_xpath("""(//div[@class='hotProductDetails'])""")
for post in posts:
    print(post.text)

此代码打印网页上这些元素中的每一段文字。但是我试图抓住一个“加载更多”的网站。按钮就可以了。

＆＃39;加载更多＆＃39;按钮加载了60多个产品，我希望我的代码只能抓住这些产品。这样我就可以将它全部放在一个循环中，点击按钮，抓取它加载的产品，附加到Pandas Dataframe并重复一定数量的迭代。

我已经无法获得能够为我执行此操作的代码，并且一旦多次按下加载更多按钮，抓取元素可以杀死chrome，反过来我的脚本。

"(//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50)

有人和我分享了这段代码，但是这个错误让我崩溃了：

invalid selector: Unable to locate an element with the xpath expression (//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50 because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '(//div[@class='hotProductDetails'])[position() > {} and position() <= {}])".format ((page -1 ) * 50, page * 50' is not a valid XPath expression.
  (Session info: chrome=60.0.3112.90)
  (Driver info: chromedriver=2.31.488763 (092de99f48a300323ecf8c2a4e2e7cab51de5ba8),platform=Windows NT 10.0.14393 x86_64)

这是我第一次有一个网络抓取项目，并使用了Selenium（这是一个惊人的包，对它印象深刻），我不知道如何解决它。我怀疑这与“＆＃39;页面”有关。代码，因为所有内容都位于同一个网页上，当您加载更多产品时，这些网页会变得更大。

我可以分享这个网站，如果有帮助的话，我会抓紧 - 就像我说这是我的第一个刮刮项目和我刚刚加入的公司。我不知道这是否是他们对我分享的不满。

Answer 1

如果你得到一个无效的XPATH选择器，那么出了点问题。最后有额外的“）”。以下对我有用

page = 2

xpath_selector = "(//div[@class='hotProductDetails'])[position() > {} and position() <= {}]".format ((page -1 ) * 50, page * 50)

此外，如果您想要最后60个元素，那么您甚至可以使用下面的

xpath_selector = "(//div[@class='hotProductDetails'])[position() > last() - 60]"

Answer 2

如果更多的加载元素并附加到最初为页面加载的div内容，为什么不跟踪数据的起点和终点？

例如，如果默认加载1 - 10，如果我点击＆＃34;加载更多＆＃34; div现在拥有20个元素，但我知道我应该只关注11-20等...？这通常是我过去解决这个问题的方式。

Selenium，通过Xpath获取元素 - 仅抓取页面上的最后60个元素

2 个答案: