我正在提取IMDB电影评论。
有问题 要播放电影评论, 必须按下[read-more]按钮。
但是,审查结束后, 我不知道该如何结束。
当前正在以“轮询”方式进行处理。 您如何更明智地处理此问题?
还有更多可供阅读的内容:
没有更多可阅读的内容:
谢谢!
答案 0 :(得分:0)
如果您是使用Python编写的,则可以使用xpath从html页面提取xpath,下面给出了检索评论的示例。您可以使用try case除外情况,以便如果页面中没有信息,循环将结束。看下面的例子,它可能对您有帮助- -
reviews = driver.find_elements_by_xpath('//article[@itemprop = "review"]')
for review in reviews:
# Initialize an empty dictionary for each review
review_dict = {}
# Find xpaths of the fields desired as columns in future data frame
# We use the try/except statements to account for the fact that the reviews are not required to have
# all the fields listed below, and if a review does not have a certain field we wish to make the
# corresponding field blank in that particular row, rather than quit upon receiving an error.
try:
airline = review.find_element_by_xpath(
'//div[@class = "review-heading"]//h1[@itemprop = "name"]').text
except:
airline = page
try:
overall = review.find_element_by_xpath('.//span[@itemprop = "ratingValue"]').text
except:
overall = ""
以同样的方式,您可以在IMDB情况下使用xpath元素,并使用try除外,这样在没有内容可读取的情况下不会弹出错误消息。