在我尝试将Web评论刮到数据框之前,请先问问是否有人问过。我遇到的问题是,它会将同一条评论刮掉10次,而不是10条不同的评论。
''' 汇入要求 从bs4导入BeautifulSoup 将熊猫作为pd导入
url ='https://www.marriott.com/hotels/hotel-reviews/amsnt-amsterdam-marriott-hotel'
for page in range(10):
page = requests.get("https://www.marriott.com/hotels/hotel-reviews/amsnt-amsterdam-marriott-hotel")
soup = BeautifulSoup(page.content, 'html.parser')
general_data = soup.find_all(class_='bvseo-review')
i = 1
first = general_data[i]
i+=1
for item in general_data:
span = first.find_all('span')
description = first.find_all('span', attrs={'itemprop':'description'})
rating = first.find_all('span', attrs={'itemprop':'ratingValue'})
auteur = first.find_all('span', attrs={'itemprop':'author'})
pagereviews = pd.DataFrame({
"description":description,
"ratingValue":rating,
"author":auteur
})
pagereviews
'''
结果是DF将包含10个唯一评论。
答案 0 :(得分:0)
我将for循环替换为
from attr import attrs, attrib
from abc import ABCMeta
class MetaClass(ABCMeta):
my_attribute = attrib()
def __new__(metacls, name, bases, namespace, **kw):
cls = super().__new__(metacls, name, bases, namespace, **kw)
# if the 'attrs' decorator modifies the type of "cls",
# the original __init__ won't be called automatically.
# since we are inheriting from other superclass, we'd better
# call it manually here, and suppress its automatic execution
# bellow.
super(MetaClass, cls).__init__(cls, name, bases, namespace, **kw)
cls.my_attribute = attrib()
return attrs(cls)
def __init__(cls, name, bases, namespace, **kw):
pass
class MyClass(object, metaclass=MetaClass):
pass