我正在刮除Trustpilot的评论,但是每次迭代数据都被覆盖。如何使它附加所有页面的所有数据,而不是最后一个?
import re
import requests
import pandas as pd
from openpyxl import load_workbook
from bs4 import BeautifulSoup
def get_total_items(url):
soup = BeautifulSoup(requests.get(url, format(0),headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"}).text, 'lxml')
stars = []
star1 = soup.find_all(attrs={"star-rating star-rating--medium"})
stars.append(star1)
df = pd.DataFrame(stars, ["Rating"])
return df
ddf = []
for i in range(29):
urls = "https://www.trustpilot.com/review/www.pandora.net?page={}"
get_total_items(urls).append(ddf)
print(ddf)
答案 0 :(得分:3)
如下所示更改for循环:
for i in range(29):
urls = "https://www.trustpilot.com/review/www.pandora.net?page={}"
ddf.append(get_total_items(urls.format(i)))