Question

我有一个脚本，该脚本单击this page底部的“显示更多”四次，以显示其他注释线程。

即使我的XPATH将选择所有“再看到1条答复... /再看到N条答复...”元素，脚本永远不会最终单击所有元素。（在撰写本文时，它仅单击了13个元素中的7个。）

XPath选择器

//ui-view//a[contains(@class, "commentAction")]

脚本的一部分（这很长，因此，如果您希望/需要查看更多内容，请告诉我。）

tab_comments = browser.find_elements_by_xpath('//a[@gogo-test="comments_tab"]')

if len(tab_comments) > 0:

    browser.implicitly_wait(5)

    try:
        comments_count = int(''.join(filter(str.isdigit, str(tab_comments[0].text))))
    except ValueError:
        comments_count = 0

    if comments_count > 0:
        # 1. Switch to Comments Tab
        tab_comments[0].click()

        # 2. Expose Additional Threads
        show_more_comments = WebDriverWait(browser, 10).until(
            EC.element_to_be_clickable((By.XPATH, '//ui-view//a[text()="show more"]'))
        )

        clicks = 0
        while clicks <= 3:
            try:
                clicks += 1
                show_more_comments.click()
            except Exception:
                break

        # 3. Expand All Threads
        see_n_more_replies = browser.find_elements_by_xpath('//ui-view//a[contains(@class, "commentAction")]')
        for idx, see_replies in enumerate(see_n_more_replies):
            print('\n\n\n\nidx: ' + str(idx) + '\n\n\n\n')
            see_replies.click()

是否需要查看按钮才能单击它们？（其他按钮似乎不是这种情况，但是在这一点上，我正在抓住稻草。）

问题是我在步骤# 4. ...中解析了注释，并且由于它不能用一个以上的响应来扩展所有线程，这是应该做的，因此这些字段在日志。

不会引发任何错误或异常。

我正在使用 Firefox / geckodriver。

Answer 1

执行以下代码段，以通过单击“显示更多”直到显示更多链接消失来加载页面上的所有评论

comment_pages = 0
no_of_comments = len(driver.find_elements_by_tag_name('desktop-comment'))
while True:
    show_more_link = driver.find_elements_by_partial_link_text('show more')
    if len(show_more_link) == 0:  # if the 'show more' link does not exist on the page
        break
    # before clicking on the link, it is important to bring the link inside the viewport. Otherwise `ElementNotVisible` exception is encountered
    driver.execute_script('arguments[0].scrollIntoView(true);', show_more_link[0])
    show_more_link[0].click()
    try:
        # wait for more comments to load by waiting till the comment count after clicking the button is greater than before the click
        WebDriverWait(driver, 10, poll_frequency=2).until(lambda x: len(driver.find_elements_by_tag_name('desktop-comment')) > no_of_comments)
    except:
        break
    no_of_comments = len(driver.find_elements_by_tag_name('desktop-comment'))
    comment_pages += 1

执行此代码后，您的dom将包含所有注释的内容。发布您开始实际抓取页面的信息。

comments = driver.find_elements_by_tag_name('desktop-comment')
for comment in comments:
    author = comment.find_element_by_xpath(".//div[@class='commentLayout-header']/a[contains(@href, 'individuals')]").text
    print 'Comment by person : ' + author

    has_more_replies = len(comment.find_elements_by_partial_link_text("more replies...")) > 0
    if has_more_replies:
        more_replies = comment.find_element_by_partial_link_text("more replies...")
        driver.execute_script('arguments[0].scrollIntoView()', more_replies)
        more_replies.click()
    reply_count = len(comment.find_elements_by_xpath(".//div[contains(@class, 'commentLayout-reply')]"))
    print 'Number of replies to the comment : ' + str(reply_count)
    print '-------------------------------------------------------------------'

其输出如下：

Comment by person : Jeff Rudd
Number of replies to the comment : 1
-------------------------------------------------------------------
Comment by person : Martin Boyle
Number of replies to the comment : 1
-------------------------------------------------------------------
Comment by person : John Bickerton
Number of replies to the comment : 1
-------------------------------------------------------------------
Comment by person : Mikkel Taanning
Number of replies to the comment : 2
-------------------------------------------------------------------
Comment by person : Christopher Sams
Number of replies to the comment : 2
-------------------------------------------------------------------
Comment by person : Marc Vieux
Number of replies to the comment : 2
-------------------------------------------------------------------

........................

您可以修改for循环以获取评论的更多详细信息

尽管检测到所有元素，但未选择所有元素

1 个答案: