我正在尝试从网页的下拉菜单中抓取所选值。如何将抓取范围缩小到正确的水平?
我在select,option,option值和selected“”上使用find&find_all尝试了多种组合。
在此html代码中,我想在选择之后获得值”“>
<select name="aar"><option value="2019/2020">2019/2020</option>
<option value="2018/2019" selected="">2018/2019</option><option
value="2017/2018">2017/2018</option><option
我希望2018/2019年作为我的结果
我当前的代码:
from bs4 import BeautifulSoup
for i in range(2018,2019):
url='https://superstats.dk/program?aar={}%2F{}'.format(i,i+1)
html_doc = requests.get(url)
soup = BeautifulSoup(html_doc.content, "lxml")
aar = soup.find_all("select")
print(aar)
答案 0 :(得分:0)
使用Css选择器通过attr selected
来获取值
from bs4 import BeautifulSoup
for i in range(2018,2019):
url='https://superstats.dk/program?aar={}%2F{}'.format(i,i+1)
html_doc = requests.get(url)
soup = BeautifulSoup(html_doc.content, "lxml")
optionval = soup.select_one('option[selected]')['value']
print(optionval)
输出:
2018/2019
答案 1 :(得分:0)
我为您编写的代码现在要做的就是每9个索引打印一行 您将以txt文件的形式从该网站获得所有年份 别忘了安装我在代码中使用的库
以此类推
>
url='https://superstats.dk/program?aar={}%2F{}'
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, "html.parser")
show_select = str(soup.find_all("select"))
file = open("test.txt", "w+")
for everything in show_select :
file.write(str(everything))
file.close()
file = open("test.txt", "r")
lines = file.readlines()
number_of_lines = len(lines)
placeholder = 0
each_line = lines[placeholder]
file2 = open("All numbers.txt", "w+")
while placeholder < number_of_lines:
for number in each_line:
if number in "0123456789" :
file2.write(str(number))
placeholder += 1
if number == "/":
file2.write(str(number))
placeholder += 1
else:
placeholder += 1
pass
file2.close()