Question

以下是我目前的代码：

from bs4 import BeautifulSoup

import requests

url  = requests.get("http://eiupanthers.com/boxscore.aspx?path=baseball&id=5065").content

soup = BeautifulSoup(url, 'html.parser')

table = soup.find('table', {'class': 'sidearm-table play-by-play'})

我的表变量不断返回为空（或者＆＃39;无＆＃39;）。这可能仅仅是语法问题。我非常精通Matlab，但是，我对Python / BeautifulSoup / Requests /等是相当新的。

任何指针都会非常感激。

我主要尝试从play-by-play表中提取数据，以便我可以在替代程序中解析这些数据并为各个玩家组装数据结构。这部分我非常有信心，一旦我汇总数据，我就能完成。

感谢您的帮助！

Answer 1

from bs4 import BeautifulSoup

import requests

header = {'User-agent' : 'Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5'}

url = requests.get("http://eiupanthers.com/boxscore.aspx?path=baseball&id=5065", headers=header).text

soup = BeautifulSoup(url, 'html.parser')
table = soup.find('table', {'class': 'sidearm-table play-by-play'})

print(table)

问题似乎是网站需要某种标题，即使request模块有很好的支持你也必须通过，例如上面提到的。

如何使用BeautifulSoup，Requests，Python从HTML中的特定表中抓取数据？

1 个答案: