我有一个脚本,该脚本单击this page底部的“显示更多”四次,以显示其他注释线程。
即使我的XPATH
将选择所有“再看到1条答复... /再看到N条答复...”元素,脚本永远不会最终单击所有元素。 (在撰写本文时,它仅单击了13个元素中的7个。)
XPath选择器
//ui-view//a[contains(@class, "commentAction")]
脚本的一部分(这很长,因此,如果您希望/需要查看更多内容,请告诉我。)
tab_comments = browser.find_elements_by_xpath('//a[@gogo-test="comments_tab"]')
if len(tab_comments) > 0:
browser.implicitly_wait(5)
try:
comments_count = int(''.join(filter(str.isdigit, str(tab_comments[0].text))))
except ValueError:
comments_count = 0
if comments_count > 0:
# 1. Switch to Comments Tab
tab_comments[0].click()
# 2. Expose Additional Threads
show_more_comments = WebDriverWait(browser, 10).until(
EC.element_to_be_clickable((By.XPATH, '//ui-view//a[text()="show more"]'))
)
clicks = 0
while clicks <= 3:
try:
clicks += 1
show_more_comments.click()
except Exception:
break
# 3. Expand All Threads
see_n_more_replies = browser.find_elements_by_xpath('//ui-view//a[contains(@class, "commentAction")]')
for idx, see_replies in enumerate(see_n_more_replies):
print('\n\n\n\nidx: ' + str(idx) + '\n\n\n\n')
see_replies.click()
是否需要查看按钮才能单击它们?(其他按钮似乎不是这种情况,但是在这一点上,我正在抓住稻草。 )
问题是我在步骤# 4. ...
中解析了注释,并且由于它不能用一个以上的响应来扩展所有线程,这是应该做的,因此这些字段在日志。
不会引发任何错误或异常。
我正在使用 Firefox / geckodriver。
答案 0 :(得分:1)
执行以下代码段,以通过单击“显示更多”直到显示更多链接消失来加载页面上的所有评论
comment_pages = 0
no_of_comments = len(driver.find_elements_by_tag_name('desktop-comment'))
while True:
show_more_link = driver.find_elements_by_partial_link_text('show more')
if len(show_more_link) == 0: # if the 'show more' link does not exist on the page
break
# before clicking on the link, it is important to bring the link inside the viewport. Otherwise `ElementNotVisible` exception is encountered
driver.execute_script('arguments[0].scrollIntoView(true);', show_more_link[0])
show_more_link[0].click()
try:
# wait for more comments to load by waiting till the comment count after clicking the button is greater than before the click
WebDriverWait(driver, 10, poll_frequency=2).until(lambda x: len(driver.find_elements_by_tag_name('desktop-comment')) > no_of_comments)
except:
break
no_of_comments = len(driver.find_elements_by_tag_name('desktop-comment'))
comment_pages += 1
执行此代码后,您的dom将包含所有注释的内容。发布您开始实际抓取页面的信息。
comments = driver.find_elements_by_tag_name('desktop-comment')
for comment in comments:
author = comment.find_element_by_xpath(".//div[@class='commentLayout-header']/a[contains(@href, 'individuals')]").text
print 'Comment by person : ' + author
has_more_replies = len(comment.find_elements_by_partial_link_text("more replies...")) > 0
if has_more_replies:
more_replies = comment.find_element_by_partial_link_text("more replies...")
driver.execute_script('arguments[0].scrollIntoView()', more_replies)
more_replies.click()
reply_count = len(comment.find_elements_by_xpath(".//div[contains(@class, 'commentLayout-reply')]"))
print 'Number of replies to the comment : ' + str(reply_count)
print '-------------------------------------------------------------------'
其输出如下:
Comment by person : Jeff Rudd Number of replies to the comment : 1 ------------------------------------------------------------------- Comment by person : Martin Boyle Number of replies to the comment : 1 ------------------------------------------------------------------- Comment by person : John Bickerton Number of replies to the comment : 1 ------------------------------------------------------------------- Comment by person : Mikkel Taanning Number of replies to the comment : 2 ------------------------------------------------------------------- Comment by person : Christopher Sams Number of replies to the comment : 2 ------------------------------------------------------------------- Comment by person : Marc Vieux Number of replies to the comment : 2 -------------------------------------------------------------------
........................
您可以修改for循环以获取评论的更多详细信息