我有一个代码,可产生抓取的数据并将其放入4个数据列表中,但我想将它们全部作为一个数据帧放置在一起,并将最终结果作为csv输出。 来宾列还包含多个人,那么如何遍历该列表? 不知道为什么我当前的代码无法正常工作,但可能很简单。 谢谢
import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
page = requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
table_rows = table.find_all("tr")
for tr in table_rows:
td = tr.find_all("td")
row = [i.text for i in td]
if len(td) == 4:
NoInSeason = td[0].find(text=True)
Guests = td[1].find(text=True)
Winner = td[2].find(text=True)
OriginalAirDate = td[3].find(text=True)
df = pd.DataFrame(np.column_stack([NoInSeason, Guests, Winner, OriginalAirDate]),
columns=['NumberInSeason', 'Guests', 'Winner', 'OriginalAirDate'])
print(df)
df.to_csv("output.csv")
答案 0 :(得分:1)
您有一些错误。这是您代码的固定版本。
import requests
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
df = pd.DataFrame(columns=(['NoInSeason', 'Guests', 'Winner', 'OriginalAirDate']))
page = requests.get("https://en.wikipedia.org/wiki/List_of_QI_episodes")
soup = BeautifulSoup(page.content, "lxml")
my_tables = soup.find_all("table",{"class":"wikitable plainrowheaders wikiepisodetable"})
for table in my_tables:
table_rows = table.find_all("tr")
for tr in table_rows:
td = tr.find_all("td")
if len(td) == 5:
NoInSeason = td[0].find(text=True)
Guests = td[2].find(text=True)
Winner = td[3].find(text=True)
OriginalAirDate = td[4].find(text=True)
df = df.append({'NoInSeason': NoInSeason, 'Guests' : Guests, 'Winner': Winner, 'OriginalAirDate' : OriginalAirDate}, ignore_index=True)
print(df)
df.to_csv("output.csv")