从下拉菜单中刮取所选值

时间:2019-07-15 19:59:53

标签: python beautifulsoup

我正在尝试从网页的下拉菜单中抓取所选值。如何将抓取范围缩小到正确的水平?

我在select,option,option值和selected“”上使用find&find_all尝试了多种组合。

在此html代码中,我想在选择之后获得值”“>

<select name="aar"><option value="2019/2020">2019/2020</option> 
     <option value="2018/2019" selected="">2018/2019</option><option 
     value="2017/2018">2017/2018</option><option 

我希望2018/2019年作为我的结果

我当前的代码:

from bs4 import BeautifulSoup
for i in range(2018,2019):
    url='https://superstats.dk/program?aar={}%2F{}'.format(i,i+1)
    html_doc = requests.get(url)
    soup = BeautifulSoup(html_doc.content, "lxml")
    aar = soup.find_all("select")
    print(aar)

2 个答案:

答案 0 :(得分:0)

使用Css选择器通过attr selected来获取值

from bs4 import BeautifulSoup
for i in range(2018,2019):
    url='https://superstats.dk/program?aar={}%2F{}'.format(i,i+1)
    html_doc = requests.get(url)
    soup = BeautifulSoup(html_doc.content, "lxml")
    optionval = soup.select_one('option[selected]')['value']
    print(optionval)

输出:

  

2018/2019

答案 1 :(得分:0)

我为您编写的代码现在要做的就是每9个索引打印一行 您将以txt文件的形式从该网站获得所有年份 别忘了安装我在代码中使用的库

  • 2019/2020
  • 2018/2019
  • 2017/2018
  • 2016/2017
  • 2015/2016
  • 以此类推

    1. 导入urllib.request
    2. 从bs4导入BeautifulSoup

>

url='https://superstats.dk/program?aar={}%2F{}'
html_doc = urllib.request.urlopen(url)
soup = BeautifulSoup(html_doc, "html.parser")

show_select = str(soup.find_all("select"))
file = open("test.txt", "w+") 

for everything in show_select :
    file.write(str(everything))
file.close()

file = open("test.txt", "r")
lines = file.readlines()
number_of_lines = len(lines)

placeholder = 0
each_line = lines[placeholder]

file2 = open("All numbers.txt", "w+")
while placeholder < number_of_lines:
    for number in each_line:
        if number in "0123456789" :
            file2.write(str(number))
            placeholder += 1
        if number == "/":
            file2.write(str(number))
            placeholder += 1            
        else:
            placeholder += 1
            pass

    file2.close()