如何获取选项的文本

时间:2016-12-22 13:01:50

标签: python web-scraping beautifulsoup

我解析了html内容并获得了

libname gcdb odbc datasrc="gc" user="" pw="" database="advance_options_report" ; 

如何从上面的代码中提取datatag = '<select name="cyfbh" style="width:100%"><option value=""></option>\n<option selected="selected" value="440615000101">440615000101-passenger car</option>\n<option value="440615000102">440615000102-two/three wheeled motorcycle</option></select>' "440615000101-passenger car"?如何获取选项的文本?

我尝试了以下选项,但我得到的是完整选项,而不是值。

"440615000102-two/three wheeled motorcycle"

4 个答案:

答案 0 :(得分:0)

怎么样:

# find all option tags
res = soup.find_all('option')

# return the tag value for each of those options
options = [x.text for x in res]

options中产生此结果:

['', '440615000101-passenger car', '440615000102-two/three wheeled motorcycle']

要从options排除任何空字符串,只需尝试:

no_empties = [x for x in options if len(x) > 0]

答案 1 :(得分:0)

from bs4 import BeautifulSoup

a="""
<select name="cyfbh" style="width:100%"><option value=""></option>\n<option selected="selected" value="440615000101">440615000101-passenger car</option>\n<option value="440615000102">440615000102-two/three wheeled motorcycle</option></select>
"""
soup = BeautifulSoup(a)
b = soup.select("option")
print b[1].text
print b[2].text

输出:

440615000101-passenger car
440615000102-two/three wheeled motorcycle

答案 2 :(得分:0)

首先,您必须使用find_all来获取所有<option>

然后您必须使用for - 循环逐个获取选项,并使用.text仅获取此选项中的文字。

您也可以使用if跳过空文字。

from bs4 import BeautifulSoup

datatag = '''<select name="cyfbh" style="width:100%">
<option value=""></option>\n
<option selected="selected" value="440615000101">440615000101-passenger car</option>\n
<option value="440615000102">440615000102-two/three wheeled motorcycle</option>
</select>'''

soup = BeautifulSoup(datatag, 'lxml')

all_options = soup.find_all('option')

for option in all_options:
    if option.text: # skip empty options
        print('    text:', option.text)
        print('   value:', option['value']) # without defaul value
        #print('   value:', option.get('value')) # default value `None`
        #print('   value:', option.get('value', 'FooBar')) # default value 'FooBar'
        print('selected:', option.get('selected')) 

或更短以创建包含所有文本的列表

all_options = soup.find_all('option')

text = [option.text for option in all_options if option.text]

答案 3 :(得分:0)

我会选择:

from lxml import etree
datatag = '<select name="cyfbh" style="width:100%"><option value=""></option>\n<option selected="selected" value="440615000101">440615000101-passenger car</option>\n<option value="440615000102">440615000102-two/three wheeled motorcycle</option></select>'
datatag = etree.HTML(datatag)
values = datatag.xpath("//option[@value!='']/text()")

这样您最终会得到一个包含您要查找的值的列表。