Question

我正在提取IMDB电影评论。

有问题要播放电影评论，必须按下[read-more]按钮。

但是，审查结束后，我不知道该如何结束。

当前正在以“轮询”方式进行处理。您如何更明智地处理此问题？

还有更多可供阅读的内容：

enter image description here

没有更多可阅读的内容：

enter image description here

谢谢！

Answer 1

如果您是使用Python编写的，则可以使用xpath从html页面提取xpath，下面给出了检索评论的示例。您可以使用try case除外情况，以便如果页面中没有信息，循环将结束。看下面的例子，它可能对您有帮助- -

reviews = driver.find_elements_by_xpath('//article[@itemprop = "review"]')
            for review in reviews:

                # Initialize an empty dictionary for each review
                review_dict = {}

                # Find xpaths of the fields desired as columns in future data frame
                # We use the try/except statements to account for the fact that the reviews are not required to have
                # all the fields listed below, and if a review does not have a certain field we wish to make the
                # corresponding field blank in that particular row, rather than quit upon receiving an error.
                try:
                    airline = review.find_element_by_xpath(
                        '//div[@class = "review-heading"]//h1[@itemprop = "name"]').text
                except:
                    airline = page
                try:
                    overall = review.find_element_by_xpath('.//span[@itemprop = "ratingValue"]').text
                except:
                    overall = ""

以同样的方式，您可以在IMDB情况下使用xpath元素，并使用try除外，这样在没有内容可读取的情况下不会弹出错误消息。

如何抓取IMDB？没有按下[阅读更多]按钮

1 个答案: