从html中提取属性值

时间:2016-06-15 08:44:37

标签: python web-scraping beautifulsoup

import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.nepalstock.com.np/marketdepthofcompany/238")
soup = BeautifulSoup(r.content,"lxml")
value_select = soup.select_one("select.form-control")
for val in value_select.find("option")[1:]:
    n = val['value']
    print (n)

为什么上面的代码打印出html中的文本而不是属性值?

1 个答案:

答案 0 :(得分:2)

由于find会返回单个代码,因此您需要find_all

for val in value_select.find_all("option")[1:]:
    n = val['value']
    print (n)

或使用css选择器跳过第一个选项:

for val in value_select.select("option + option"):

两者都是一样的:

In [1]: import requests

In [2]: from bs4 import BeautifulSoup

In [3]: r = requests.get("http://www.nepalstock.com.np/marketdepthofcompany/238")

In [4]: soup = BeautifulSoup(r.content,"lxml")

In [5]: value_select = soup.select_one("select.form-control")

In [6]: for val in value_select.find_all("option")[1:]:
   ...:         n = val['value']
   ...:         print (n)
   ...:     
ACEDBL
ACEDPO
ADBL
AHPC
ALDBL
ALDBLP
ALICL
ALICLP
APEX
APEXPO
API
ARDBL
ARDBLP
ARUN
ARUNPO
AVU
BARUN
BBBLNP
...................................

In [7]: for val in value_select.select("option + option"):
   ...:         n = val['value']
   ...:         print (n)
   ...:     
ACEDBL
ACEDPO
ADBL
AHPC
ALDBL
ALDBLP
ALICL
ALICLP
APEX
APEXPO
API
ARDBL
ARDBLP
ARUN
ARUNPO
AVU
BARUN
BBBLNP
..........................

要获取值为整数的选项,请选择ID StockSymbol_Select2,但有多个select.form-control,因此您需要准确指定哪一个:

import requests
from bs4 import BeautifulSoup

r = requests.get("http://www.nepalstock.com.np/marketdepthofcompany/238")
soup = BeautifulSoup(r.content,"lxml")
value_select = soup.select_one("#StockSymbol_Select2")
for val in value_select.select("option + option"):
    print (val["value"])

那会给你你想要的东西:

In [13]: value_select = soup.select_one("#StockSymbol_Select2")

In [14]: for val in value_select.select("option + option"):
   ....:         print (val["value"])
   ....:     
216
294
397
360
406
660
385
599
262
666
697
..............................