如何从此inspect元素中选择所有stateid和state名称

时间:2016-03-20 13:41:37

标签: python django beautifulsoup

我想从下面的代码中选择所有值名称,并通过beautifulsoup选择它们的状态ID。 请任何人告诉如何编写beautifulsoup脚本废弃所有州名和州名

    <select id="stateId" class="states" name="state" required="required">
<option value="">Select State</option>
<option value="Andaman and Nicobar Islands" stateid="1">Andaman and Nicobar Islands</option>
<option value="Andhra Pradesh" stateid="2">Andhra Pradesh</option>
<option value="Arunachal Pradesh" stateid="3">Arunachal Pradesh</option>
<option value="Assam" stateid="4">Assam</option>
<option value="Bihar" stateid="5">Bihar</option>
<option value="Chandigarh" stateid="6">Chandigarh</option>
<option value="Chhattisgarh" stateid="7">Chhattisgarh</option>
<option value="Dadra and Nagar Haveli" stateid="8">Dadra and Nagar Haveli</option>
<option value="Daman and Diu" stateid="9">Daman and Diu</option>
<option value="Delhi" stateid="10">Delhi</option>
<option value="Goa" stateid="11">Goa</option>
<option value="Gujarat" stateid="12">Gujarat</option>
<option value="Haryana" stateid="13">Haryana</option>
<option value="Himachal Pradesh" stateid="14">Himachal Pradesh</option>

2 个答案:

答案 0 :(得分:0)

为此,我设计了一个列表理解,循环遍历所有&#34;选项&#34;元素使用findAll并获得其状态ID和州名:

[(x["stateid"], x["value"]) for x in bs.findAll("option") if x["value"] != ""]

我使用了这样的列表理解:

>>> import bs4
>>> bs = bs4.BeautifulSoup(<your text>)
>>> [(x["stateid"], x["value"]) for x in bs.findAll("option") if x["value"] != ""]
[('1', 'Andaman and Nicobar Islands'), ('2', 'Andhra Pradesh'), ('3', 'Arunachal Pradesh'), ('4', 'Assam'), ('5', 'Bihar'), ('6', 'Chandigarh'), ('7', 'Chhattisgarh'), ('8', 'Dadra and Nagar Haveli'), ('9', 'Daman and Diu'), ('10', 'Delhi'), ('11', 'Goa'), ('12', 'Gujarat'), ('13', 'Haryana'), ('14', 'Himachal Pradesh')]

它返回一个元组列表,每个元组的第一个元素是状态ID,第二个元素是状态名称。它还使用if x["value"] != ""忽略了开头的空白值。

答案 1 :(得分:0)

要改进OrangeFlash81建议的内容,您可以使用find_all()传递value=lambda x: x以避免提取第一个“空”选项:

select = soup.find("select", id="stateId")
options = [(option["stateid"], option["value"]) 
           for option in select.find_all("option", value=lambda x: x)]
print(options)

或者,CSS selector通过切片结果集跳过第一个“空”选项:

options = [(option["stateid"], option["value"])
           for option in soup.select("#stateId option")[1:]]
print(options)