从下拉列表中剔除价值

时间:2016-11-11 02:49:01

标签: python selenium web-scraping beautifulsoup

我正在尝试使用Python与selenium和Beautiful Soup的组合从网页上的下拉元素中抓取值和文本。

我能够获取文本但我无法通过get_attribute命令获取值。

当我打印位于网页上的元素时,它会返回以下内容

print(price)

获取它的print语句给出错误:

None Type object is not callable

price=soup.find("select",{"id":"space-prices"})
print(price)
print(price.text)
print(price.get_attribute('value'))

打印(价格)的输出是

<select class="pricing-bar-select" id="space-prices" name="space-prices"><option selected="selected" value="£360">Per Day</option>
<option value="£1,260">Per Week</option>
<option value="£5,460">Per Month</option>
<option value="£16,380">Per Quarter</option>
<option value="£65,520">Per Year</option></select>

网页的网址是

https://www.appearhere.co.uk/spaces/north-kensington-upcycling-store-and-cafe

2 个答案:

答案 0 :(得分:3)

试试这个:

from selenium import webdriver
from bs4 import BeautifulSoup


driver = webdriver.Chrome()
url= "https://www.appearhere.co.uk/spaces/north-kensington-upcycling-store-and-cafe"
driver.maximize_window()
driver.get(url)

content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
price=soup.find("select",{"id":"space-prices"})
options = price.find_all("option")
options1=[y.text for y in options]
values = [o.get("value") for o in options]
for x in range(5):
    print options1[x], values[x].encode('utf8')
driver.quit()

会打印

Per Day £360
Per Week £1,260
Per Month £5,460
Per Quarter £16,380
Per Year £65,520

希望这是你想要的

答案 1 :(得分:1)

因为get_attribute似乎是None。它不是prices对象的有效属性。所以它不是你可以调用的函数 - 因此错误。如果你拿走括号并打印出prices.get_attribute则不会打印任何内容,因为值为None

此外,<select>标记没有&#34;值&#34;属性首先。你所做的是你已经抓住了<select>标签及其所有孩子。 <select>标记中的每个孩子(<option>标记)都有&#34;值&#34;属性。如果您尝试获取<option>中所有<select>代码的所有值,那么您应该执行以下操作:

price=soup.find("select",{"id":"space-prices"})

# get all <options> in a list
options = price.find_all("option")

# for each element in that list, pull out the "value" attribute
values = [o.get("value") for o in options]
print(values)
#[u'\xa3360', u'\xa31,260', u'\xa35,460', u'\xa316,380', u'\xa365,520']