如何遍历列表中的列表,然后在末尾创建一个csv?

时间:2019-02-17 00:17:05

标签: python pandas dataframe web-scraping beautifulsoup

我有一个代码,可产生抓取的数据并将其放入4个数据列表中,但我想将它们全部作为一个数据帧放置在一起,并将最终结果作为csv输出。 来宾列还包含多个人,那么如何遍历该列表? 不知道为什么我当前的代码无法正常工作,但可能很简单。 谢谢

import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np

page = requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
    table_rows = table.find_all("tr")
    for tr in table_rows:
        td = tr.find_all("td")
        row = [i.text for i in td]
        if len(td) == 4:
            NoInSeason = td[0].find(text=True)
            Guests = td[1].find(text=True)
            Winner  = td[2].find(text=True)
            OriginalAirDate = td[3].find(text=True)     
            df = pd.DataFrame(np.column_stack([NoInSeason, Guests, Winner, OriginalAirDate]), 
             columns=['NumberInSeason', 'Guests', 'Winner', 'OriginalAirDate'])
            print(df)
            df.to_csv("output.csv")

1 个答案:

答案 0 :(得分:1)

您有一些错误。这是您代码的固定版本。

import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np

df = pd.DataFrame(columns=(['NoInSeason', 'Guests', 'Winner', 'OriginalAirDate']))
page = requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
    table_rows = table.find_all("tr")
    for tr in table_rows:
        td = tr.find_all("td")
        if len(td) == 5:
            NoInSeason = td[0].find(text=True)
            Guests = td[2].find(text=True)
            Winner  = td[3].find(text=True)
            OriginalAirDate = td[4].find(text=True) 
            df = df.append({'NoInSeason': NoInSeason, 'Guests' : Guests, 'Winner': Winner, 'OriginalAirDate' : OriginalAirDate}, ignore_index=True)
print(df)
df.to_csv("output.csv")