Question

过去我已经能够成功使用漂亮的汤了（我仍在学习如何使用它），但是我对如何在此处获得一张特定的表格感到困惑：

https://fantasydata.com/nfl-stats/point-spreads-and-odds?season=2017&seasontype=1&week=1

过去，这很简单：

url = 'https://fantasydata.com/nfl-stats/point-spreads-and-odds?   season=2017&seasontype=1&week=1' 
html = requests.get(url)
soup = BeautifulSoup(html.text, "html.parser")

或

driver = webdriver.Chrome()
page_url = 'https://fantasydata.com/nfl-stats/point-spreads-and-odds?season=2017&seasontype=1&week=1' %(year,nfl_week)
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")

但是表中的任何数据都无法从页面源进行解析。

有没有更好的方法来搭配美丽的汤呢？

编辑：

好，所以我回去做了：

driver = webdriver.Chrome()
page_url = 'https://fantasydata.com/nfl-stats/point-spreads-and-odds?season=2017&seasontype=1&week=1' %(year,nfl_week)
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")

再次

。这次，数据正在显示。我以为自从它被加载以来，使用Selenium是正确的方法，但是当它不起作用时就被扔掉了。

任何关于为什么第一次都不起作用的想法？在页面加载之前，我没有关闭浏览器或任何工具。

Answer 1

您不需要BeautifulSoup或Selenium。在POSTing查询https://fantasydata.com/NFLTeamStats/Odds_Read时，数据可以作为python字典使用。

query = {  # just mimicking sample query that I saw after loading your link
    'page': 1,
    'pageSize': 50,
    'filters.season': 2017,
    'filters.seasontype': 1,
    'filters.week': 1,
}
response = requests.post('https://fantasydata.com/NFLTeamStats/Odds_Read', data=query)
data = response.json()
data
{'Data': [{'Date': 'September 7, 2017 8:30 PM', 'Favorite': 'at Patriots', 'PointSpread': '-9.0', 'UnderDog': 'Chiefs', 'OverUnder': '48.0', 'AwayTeamMoneyLine': '+400', 'HomeTeamMoneyLine': '-450'}, {'Date': 'September 10, 2017 1:00 PM', 'Favorite': 'Buccaneers', 'PointSpread': '-2.5', 'UnderDog': 'at Dolphins', 'OverUnder': '41.5', 'AwayTeamMoneyLine': '-140', 'HomeTeamMoneyLine': '+120'}, {'Date': 'September 10, 2017 1:00 PM', 'Favorite': 'at ...

您可以通过研究Chrome开发者工具的网络部分（推F12），尤其是XHR子部分找到此方法：

Answer 2

好的，所以我偷偷摸摸地打了一下，有人殴打了我！！！我做了与其他答案相同的操作，但是在Firefox开发工具（组合ctrl + shift + k）中，与其他答案一样，使用“网络”选项卡的xhr部分。看起来，填充您网站上表的API调用是对POST的{{1}}请求。这是一个带有所有可用参数的https://fantasydata.com/NFLTeamStats/Odds_Read对象：

js

{ filter:, filters.endweek:, filters.exportType:, filters.leaguetype:, filters.minimumsnaps:, filters.playerid:, filters.position:, filters.scope:, filters.scoringsystem:, filters.searchtext:, filters.season: 2017, filters.seasontype: 1, filters.startweek:, filters.stattype:, filters.subscope:, filters.team:, filters.teamaspect:, filters.week: 1, group:, page: 1, pageSize: 50, sort: }的主体将是一个POST对象，就像上面的对象一样。如果它们不阻止跨域请求，则可以直接使用Python请求库。如果它们确实阻止跨域请求，则可以尝试模仿它们设置的标头和选项，或者，我忘记了如何设置，但是我知道您可以从页面中的硒中注入javascript AJAX请求。就像一个旁注，如果要在Python中自动执行异步json吸油，必须使用webDriverWait或其他一些异步代码来等待响应。

美丽的汤-在特定页面上遇到麻烦

2 个答案: