将表格数据写入网页的csv文件

时间:2017-09-20 14:28:49

标签: python python-3.x pandas csv web-scraping

我在python中编写了一个脚本来解析网页中的一些数据,并通过panda将其写入csv文件。到目前为止,我所写的内容可以解析该页面中的所有可用表格,但是在写入csv文件的情况下,它会将该页面中的最后一个表格写入该csv文件。当然,由于循环,数据被覆盖。如何解决这个缺陷,以便我的刮刀能够从不同的表而不是最后一个表中写入所有数据?提前谢谢。

import csv
import requests 
from bs4 import BeautifulSoup
import pandas as pd


res = requests.get('http://www.espn.com/nba/schedule/_/date/20171001').text
soup = BeautifulSoup(res,"lxml")
for table in soup.find_all("table"):
    df = pd.read_html(str(table))[0]
    df.to_csv("table_item.csv")
    print(df)

顺便说一下,我希望只使用panda将数据写入csv文件。再次感谢。

1 个答案:

答案 0 :(得分:1)

您可以在网页中使用read_html返回list of DataFrames的内容,因此df需要concat

dfs = pd.read_html('http://www.espn.com/nba/schedule/_/date/20171001')

df = pd.concat(dfs, ignore_index=True)
#if necessary rename columns
d = {'Unnamed: 1':'a', 'Unnamed: 7':'b'}
df = df.rename(columns=d)
print (df.head())
               matchup               a  time (ET)  nat tv  away tv  home tv  \
0          Atlanta ATL       Miami MIA        NaN     NaN      NaN      NaN   
1               LA LAC     Toronto TOR        NaN     NaN      NaN      NaN   
2  Guangzhou Guangzhou  Washington WSH        NaN     NaN      NaN      NaN   
3        Charlotte CHA      Boston BOS        NaN     NaN      NaN      NaN   
4          Orlando ORL     Memphis MEM        NaN     NaN      NaN      NaN   

                           tickets   b  
0  2,401 tickets available from $6 NaN  
1   284 tickets available from $29 NaN  
2  2,792 tickets available from $2 NaN  
3  2,908 tickets available from $6 NaN  
4  1,508 tickets available from $3 NaN  

最后to_csv用于写入文件:

df.to_csv("table_item.csv", index=False)

编辑:

为了便于学习,可以将每个DataFrame添加到列表中,然后添加concat

res = requests.get('http://www.espn.com/nba/schedule/_/date/20171001').text
soup = BeautifulSoup(res,"lxml")
dfs = []
for table in soup.find_all("table"):
    df = pd.read_html(str(table))[0]
    dfs.append(df)

df = pd.concat(dfs, ignore_index=True)
print(df)

df.to_csv("table_item.csv")