Question

我使用Pandas和BeautifulSoup从Wikipedia抓了一张桌子，得到了一个列表。我想将其转换为Dataframe，但是当我使用pd.DataFrame（）函数时，结果与预期不符。请帮忙。

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))

一切正常，直到这一点，但是之后我尝试下面的代码

neigh = pd.DataFrame(df)

它只返回一行和一列输出。

Answer 1

您已经有一个封装在列表中的pandas DataFrame。您只需要考虑第一个元素：

neigh = df[0]
print(neigh)

    Postcode           Borough          Neighbourhood
0        M1A      Not assigned           Not assigned
1        M2A      Not assigned           Not assigned
2        M3A        North York              Parkwoods
3        M4A        North York       Victoria Village
4        M5A  Downtown Toronto           Harbourfront
..       ...               ...                    ...
282      M8Z         Etobicoke              Mimico NW
283      M8Z         Etobicoke     The Queensway West
284      M8Z         Etobicoke  Royal York South West
285      M8Z         Etobicoke         South of Bloor
286      M9Z      Not assigned           Not assigned

[287 rows x 3 columns]

Answer 2

您可以使用pandas，read_html函数直接从网址中读取表格

>>> url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
>>> tables = pd.read_html(url)
>>> len(tables)
3
>>> tables[0]
    Postcode           Borough          Neighbourhood
0        M1A      Not assigned           Not assigned
1        M2A      Not assigned           Not assigned
2        M3A        North York              Parkwoods
3        M4A        North York       Victoria Village
4        M5A  Downtown Toronto           Harbourfront
..       ...               ...                    ...
282      M8Z         Etobicoke              Mimico NW
283      M8Z         Etobicoke     The Queensway West
284      M8Z         Etobicoke  Royal York South West
285      M8Z         Etobicoke         South of Bloor
286      M9Z      Not assigned           Not assigned

[287 rows x 3 columns]
>>> type(tables[0])
<class 'pandas.core.frame.DataFrame'>

read_html将从网址中读取所有table标记并返回dataframes

的列表

Answer 3

您已经在df中有了数据框

print(df[0])

如何在Pandas中将列表转换为数据框？

3 个答案: