我正试图从该网站上抓取所有数据:
http://www.dartsdatabase.co.uk/PlayerStats.aspx?statKey=1&pg=7
但是,我不知道如何通过'stat'下拉菜单进行迭代。这些选项中的每一个都包含我需要抓取的表格。
到目前为止,我有以下代码,该代码列出了与下拉列表中的每个元素相关的选项和值:
url = 'http://www.dartsdatabase.co.uk/PlayerStats.aspx'
response = requests.get(url).text
soup = BeautifulSoup(response,"lxml")
drop = soup.find('select',{'name':'stat'}).findAll("option")
options = []
val = []
for i in range(0,len(drop)):
options.append(drop[i].text)
val.append(drop[i]['value'])
任何帮助将不胜感激!
答案 0 :(得分:2)
发出POST请求以更改stat
参数。您可以从选项的页面value
属性中收集适当的值
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
data = {
'nameSearch': '',
'dateFrom': '02/10/2017',
'dateTo': '02/10/2019',
'organStat': 'All',
'stat': '1',
'tourns': 'All',
'pg': '7'
}
def get_soup():
r = s.post('http://www.dartsdatabase.co.uk/PlayerStats.aspx?statKey=1&pg=7', data=data)
soup = bs(r.content, 'lxml')
return soup
with requests.Session() as s:
soup = get_soup()
table = pd.read_html(str(soup.select_one('br + table')))[0]
stats = [i['value'] for i in soup.select('[name="stat"] option')][1:]
print(table)
for i in stats:
data['stat']=i
soup = get_soup()
table = pd.read_html(str(soup.select_one('br + table')))[0]
print(table)