如何报废具有特定期权价值的期权

时间:2018-12-23 16:55:22

标签: python web-scraping

假设我有一个类似的列表:

<option value="Mango/20181106/UK">06/11/2018</option>,
<option value="Orange/20181104/CN">04/11/2018</option>,
<option value="Apple/20181031/CN">31/10/2018</option>,
<option value="Orange/20181028/CN">28/10/2018</option>,

我该如何只废弃那些选项值以“橙色”开头的选项?

部分代码:

url='myurl'
url_content = requests.get(url)
html_content = url_content.text
soup = BeautifulSoup(html_content, 'lxml')

soup2 = soup.find('div', class_="rowDiv5")
data = soup2.find('td', class_="tdAlignR")
options = data.find_all("option" )

2 个答案:

答案 0 :(得分:2)

与^运算符一起使用css选择器(表示属性值开头)更有效

from bs4 import BeautifulSoup as bs

html = """
<option value="Mango/20181106/UK">06/11/2018</option>,
<option value="Orange/20181104/CN">04/11/2018</option>,
<option value="Apple/20181031/CN">31/10/2018</option>,
<option value="Orange/20181028/CN">28/10/2018</option>
"""
soup = bs(html, 'lxml')
items = [item.text for item in soup.select('option[value^="Orange"]')]

答案 1 :(得分:1)

您可以使用re.compile指定所需的模式:

from bs4 import BeautifulSoup as soup
import re
s = """
<option value="Mango/20181106/UK">06/11/2018</option>,
<option value="Orange/20181104/CN">04/11/2018</option>,
<option value="Apple/20181031/CN">31/10/2018</option>,
<option value="Orange/20181028/CN">28/10/2018</option>
"""
results = soup(s, 'html.parser').find_all('option', {'value':re.compile('^Orange')})

输出:

[<option value="Orange/20181104/CN">04/11/2018</option>, 
 <option value="Orange/20181028/CN">28/10/2018</option>]