刮板仅打印最后一页数据,而不是所有页面-BS4

时间:2019-12-27 11:54:48

标签: python pandas beautifulsoup

我正在刮除Trustpilot的评论,但是每次迭代数据都被覆盖。如何使它附加所有页面的所有数据,而不是最后一个?

import re
import requests
import pandas as pd
from openpyxl import load_workbook 


from bs4 import BeautifulSoup

def get_total_items(url):

    soup = BeautifulSoup(requests.get(url, format(0),headers={"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"}).text, 'lxml')
    stars = []
    star1 = soup.find_all(attrs={"star-rating star-rating--medium"})
    stars.append(star1)
    df = pd.DataFrame(stars, ["Rating"])
    return df

ddf = []
for i in range(29): 
    urls = "https://www.trustpilot.com/review/www.pandora.net?page={}"  
    get_total_items(urls).append(ddf)

print(ddf)

1 个答案:

答案 0 :(得分:3)

如下所示更改for循环:

for i in range(29): 
    urls = "https://www.trustpilot.com/review/www.pandora.net?page={}"  
    ddf.append(get_total_items(urls.format(i)))