如何将数据从HTML表中抓取到Python列表/字典中?

时间:2019-05-08 00:39:03

标签: python web beautifulsoup screen-scraping

我正在尝试将数据从Baseball Prospectus导入Python表/字典(会更好吗?)。

下面是我所拥有的,基于以下内容:使用Python自动完成无聊的事情。

我发现我的方法无法正确使用这些功能,但是我无法弄清楚应该使用哪些工具。

import requests
import webbrowser
import bs4

res = requests.get('https://legacy.baseballprospectus.com/card/70917/trea-turner')
res.raise_for_status()
webpage = bs4.BeautifulSoup(res.text)
table = webpage.select('newstat_career_log_datagrid')
list = []
for item in table:
    list.append(item)

print(list)

1 个答案:

答案 0 :(得分:0)

使用pandas数据框架首先获取MLB Statistics表,然后将数据框架转换为字典对象。如果没有安装pandas,则可以在单个命令中完成。

 pip install pandas

然后使用下面的代码。

import pandas as pd
df=pd.read_html('https://legacy.baseballprospectus.com/card/70917/trea-turner')
data_dict = df[5].to_dict()
print(data_dict)

输出:

{'PA': {0: 44, 1: 324, 2: 447, 3: 740, 4: 15, 5: 1570}, '2B': {0: 1, 1: 14, 2: 24, 3: 27, 4: 1, 5: 67}, 'TEAM': {0: 'WAS', 1: 'WAS', 2: 'WAS', 3: 'WAS', 4: 'WAS', 5: 'Career'}, 'SB': {0: 2, 1: 33, 2: 46, 3: 43, 4: 4, 5: 128}, 'G': {0: 27, 1: 73, 2: 98, 3: 162, 4: 4, 5: 364}, 'HR': {0: 1, 1: 13, 2: 11, 3: 19, 4: 2, 5: 46}, 'FRAA': {0: 0.5, 1: -3.2, 2: 0.2, 3: 7.1, 4: -0.1, 5: 4.5}, 'BWARP': {0: 0.1, 1: 2.4, 2: 2.7, 3: 5.0, 4: 0.1, 5: 10.4}, 'CS': {0: 2, 1: 6, 2: 8, 3: 9, 4: 0, 5: 25}, '3B': {0: 0, 1: 8, 2: 6, 3: 6, 4: 0, 5: 20}, 'H': {0: 9, 1: 105, 2: 117, 3: 180, 4: 5, 5: 416}, 'AGE': {0: '22', 1: '23', 2: '24', 3: '25', 4: '26', 5: 'Career'}, 'OBP': {0: 0.295, 1: 0.37, 2: 0.33799999999999997, 3: 0.344, 4: 0.4, 5: 0.34700000000000003}, 'AVG': {0: 0.225, 1: 0.342, 2: 0.284, 3: 0.271, 4: 0.35700000000000004, 5: 0.289}, 'DRC+': {0: 77, 1: 128, 2: 99, 3: 107, 4: 103, 5: 108}, 'SO': {0: 12, 1: 59, 2: 80, 3: 132, 4: 5, 5: 288}, 'YEAR': {0: '2015', 1: '2016', 2: '2017', 3: '2018', 4: '2019', 5: 'Career'}, 'SLG': {0: 0.325, 1: 0.5670000000000001, 2: 0.451, 3: 0.41600000000000004, 4: 0.857, 5: 0.46}, 'DRAA': {0: -1.0, 1: 11.4, 2: 1.0, 3: 8.5, 4: 0.1, 5: 20.0}, 'HBP': {0: 0, 1: 1, 2: 4, 3: 5, 4: 0, 5: 10}, 'BRR': {0: 0.1, 1: 5.9, 2: 6.8, 3: 2.7, 4: 0.2, 5: 15.7}, 'BB': {0: 4, 1: 14, 2: 30, 3: 69, 4: 1, 5: 118}}