我希望使用BeautifulSoup从网站(https://datagolf.org/performance-table)中提取动态表。但是,当我使用soup.find()
命令查找表的源代码时,输出中没有任何内容。这是我正在使用的代码:
url = 'https://datagolf.org/performance-table'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
box = soup.find('div', {'class': 'table-div'})
box
上面代码的输出显示:
<div class="table-div">
</div>
当我将类更改为class_='table'
时,输出将显示为空白。对这里发生的事情有什么想法吗?可能是我要求输入错误的源代码吗?
答案 0 :(得分:1)
我尝试了漂亮的汤,但是没有用,但是它和硒一起用。 我为此编写了代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox(executable_path='geckodriver.exe')
driver.get("https://datagolf.org/performance-table")
l = []
l1 = []
#a = driver.find_element_by_class_name('table')
#print(a.text) # this will print all of the table content
b = driver.find_elements_by_class_name('datahead')
for d in b:
l1.append(d.text)
l1.pop(5)
l.append(l1)
c = driver.find_elements_by_class_name('datarow')
l1 = []
for d in c:
e = d.text
e = e.split('\n')
l.append(e)
print(l) # this will print table as a list
driver.close()
答案 1 :(得分:1)
数据以Json格式存储在页面中,您可以使用re
/ json
模块来解析数据。
例如:
import re
import json
import requests
url = 'https://datagolf.org/performance-table'
txt = requests.get(url).text
data = json.loads(re.search(r"var reload_data = JSON\.parse\('(.*?)'", txt).group(1))
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
# print some data to screen:
for row in data['data']['2020']['table']:
print('{:<40} {}'.format(row['player_name'], row['wins']))
打印:
McIlroy, Rory 1.0
Hatton, Tyrrell 1.0
Rahm, Jon 0.0
Thomas, Justin 2.0
Schauffele, Xander 0.0
Matsuyama, Hideki 1.0
Reed, Patrick 1.0
Woods, Tiger 1.0
...and so on.
编辑:数据格式如下:
...
{
"amateur": 0,
"app_raw": 0.9807287716094194,
"app_true": 1.1416339999999998,
"arg_raw": 0.30359835879467356,
"arg_true": 0.35591150000000005,
"dg_id": 10091,
"events": 8,
"exp_major_wins": 0.0,
"exp_pga_wins": 1.5499999999999998,
"flag": "NIR",
"ott_raw": 0.699243421907403,
"ott_true": 0.8408904999999999,
"player_name": "McIlroy, Rory",
"putt_raw": 0.07181996378995552,
"putt_true": 0.16352450000000002,
"rnds": 29,
"sg_raw": 2.5018271707385242,
"sg_true": 2.9106948275862066,
"shotlink_rnds": 20.0,
"t2g_raw": 1.983570552311496,
"t2g_true": 2.3384359999999997,
"tour": "PGA",
"wins": 1.0
},
...
您可以使用键app_true
,putt_true
,arg_true
等。