如何抓取IMDB?没有按下[阅读更多]按钮

时间:2019-06-03 05:02:09

标签: python beautifulsoup web-crawler

我正在提取IMDB电影评论。

有问题 要播放电影评论, 必须按下[read-more]按钮。

但是,审查结束后, 我不知道该如何结束。

当前正在以“轮询”方式进行处理。 您如何更明智地处理此问题?

还有更多可供阅读的内容:

enter image description here

没有更多可阅读的内容:

enter image description here

谢谢!

1 个答案:

答案 0 :(得分:0)

如果您是使用Python编写的,则可以使用xpath从html页面提取xpath,下面给出了检索评论的示例。您可以使用try case除外情况,以便如果页面中没有信息,循环将结束。看下面的例子,它可能对您有帮助-  -

reviews = driver.find_elements_by_xpath('//article[@itemprop = "review"]')
            for review in reviews:

                # Initialize an empty dictionary for each review
                review_dict = {}

                # Find xpaths of the fields desired as columns in future data frame
                # We use the try/except statements to account for the fact that the reviews are not required to have
                # all the fields listed below, and if a review does not have a certain field we wish to make the
                # corresponding field blank in that particular row, rather than quit upon receiving an error.
                try:
                    airline = review.find_element_by_xpath(
                        '//div[@class = "review-heading"]//h1[@itemprop = "name"]').text
                except:
                    airline = page
                try:
                    overall = review.find_element_by_xpath('.//span[@itemprop = "ratingValue"]').text
                except:
                    overall = ""

以同样的方式,您可以在IMDB情况下使用xpath元素,并使用try除外,这样在没有内容可读取的情况下不会弹出错误消息。