我使用Pandas和BeautifulSoup从Wikipedia抓了一张桌子,得到了一个列表。我想将其转换为Dataframe,但是当我使用pd.DataFrame()函数时,结果与预期不符。请帮忙。
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))
一切正常,直到这一点,但是之后我尝试下面的代码
neigh = pd.DataFrame(df)
它只返回一行和一列输出。
答案 0 :(得分:2)
您已经有一个封装在列表中的pandas DataFrame。您只需要考虑第一个元素:
neigh = df[0]
print(neigh)
Postcode Borough Neighbourhood
0 M1A Not assigned Not assigned
1 M2A Not assigned Not assigned
2 M3A North York Parkwoods
3 M4A North York Victoria Village
4 M5A Downtown Toronto Harbourfront
.. ... ... ...
282 M8Z Etobicoke Mimico NW
283 M8Z Etobicoke The Queensway West
284 M8Z Etobicoke Royal York South West
285 M8Z Etobicoke South of Bloor
286 M9Z Not assigned Not assigned
[287 rows x 3 columns]
答案 1 :(得分:2)
您可以使用pandas
,read_html
函数直接从网址中读取表格
>>> url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
>>> tables = pd.read_html(url)
>>> len(tables)
3
>>> tables[0]
Postcode Borough Neighbourhood
0 M1A Not assigned Not assigned
1 M2A Not assigned Not assigned
2 M3A North York Parkwoods
3 M4A North York Victoria Village
4 M5A Downtown Toronto Harbourfront
.. ... ... ...
282 M8Z Etobicoke Mimico NW
283 M8Z Etobicoke The Queensway West
284 M8Z Etobicoke Royal York South West
285 M8Z Etobicoke South of Bloor
286 M9Z Not assigned Not assigned
[287 rows x 3 columns]
>>> type(tables[0])
<class 'pandas.core.frame.DataFrame'>
read_html
将从网址中读取所有table
标记并返回dataframes
答案 2 :(得分:1)
您已经在df中有了数据框
print(df[0])