抓取熊猫DF

时间:2019-10-09 14:37:15

标签: python-3.x

在我尝试将Web评论刮到数据框之前,请先问问是否有人问过。我遇到的问题是,它会将同一条评论刮掉10次,而不是10条不同的评论。

''' 汇入要求 从bs4导入BeautifulSoup 将熊猫作为pd导入

url ='https://www.marriott.com/hotels/hotel-reviews/amsnt-amsterdam-marriott-hotel'

for page in range(10):
page = requests.get("https://www.marriott.com/hotels/hotel-reviews/amsnt-amsterdam-marriott-hotel")

soup = BeautifulSoup(page.content, 'html.parser')

general_data = soup.find_all(class_='bvseo-review')
i = 1
first = general_data[i]
i+=1

for item in general_data:
    span = first.find_all('span')
    description = first.find_all('span', attrs={'itemprop':'description'})
    rating = first.find_all('span', attrs={'itemprop':'ratingValue'})
    auteur = first.find_all('span', attrs={'itemprop':'author'})

pagereviews = pd.DataFrame({
    "description":description,
    "ratingValue":rating,
    "author":auteur
})

pagereviews

'''

结果是DF将包含10个唯一评论。

1 个答案:

答案 0 :(得分:0)

我将for循环替换为

from attr import attrs, attrib
from abc import ABCMeta


class MetaClass(ABCMeta):

    my_attribute = attrib()

    def __new__(metacls, name, bases, namespace, **kw):
        cls = super().__new__(metacls, name, bases, namespace, **kw)
        # if the 'attrs' decorator modifies the type of "cls", 
        # the original __init__ won't be called automatically.
        # since we are inheriting from other superclass, we'd better
        # call it manually here, and suppress its automatic execution
        # bellow. 
        super(MetaClass, cls).__init__(cls, name, bases, namespace, **kw)
        cls.my_attribute = attrib()
        return attrs(cls)

    def __init__(cls, name, bases, namespace, **kw):
        pass


class MyClass(object, metaclass=MetaClass):
    pass