python BeautifulSoup得到select.value而不是文本

时间:2013-10-15 22:53:46

标签: python html select beautifulsoup

<select>
  <option value="0">2002/12</option>
  <option value="1">2003/12</option>
  <option value="2">2004/12</option>
  <option value="3">2005/12</option>
  <option value="4">2006/12</option>
  <option value="5" selected>2007/12</option>
</select>

使用此代码,我需要将值'0'而不是文本'2002/12'

我尝试了很多BS4选项,.stripped_strings.strip().contentsget()等。

我如何获得价值而非文字?

2 个答案:

答案 0 :(得分:18)

您需要value 属性;使用映射语法访问tag attributes

option['value']

演示:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <select>
...   <option value="0">2002/12</option>
...   <option value="1">2003/12</option>
...   <option value="2">2004/12</option>
...   <option value="3">2005/12</option>
...   <option value="4">2006/12</option>
...   <option value="5" selected>2007/12</option>
... </select>
... ''')
>>> for option in soup.find_all('option'):
...     print 'value: {}, text: {}'.format(option['value'], option.text)
... 
value: 0, text: 2002/12
value: 1, text: 2003/12
value: 2, text: 2004/12
value: 3, text: 2005/12
value: 4, text: 2006/12
value: 5, text: 2007/12

答案 1 :(得分:2)

像这样:

>>> import BeautifulSoup
>>> doc = """
... <select>
...   <option value="0">2002/12</option>
...   <option value="1">2003/12</option>
...   <option value="2">2004/12</option>
...   <option value="3">2005/12</option>
...   <option value="4">2006/12</option>
...   <option value="5" selected>2007/12</option>
... </select>
... """
>>> soup = BeautifulSoup.BeautifulSoup(doc)
>>> list = soup.findAll('option')
>>> for l in list:
...   print l['value']
... 
0
1
2
3
4
5
>>>