如何使用BeautifulSoup的值获取单个选项的文本

时间:2018-06-27 21:58:31

标签: beautifulsoup html-parsing

我需要使用BeautifulSoup在下拉菜单中访问特定选项的文本。到目前为止,我只能找到获取所有选项文本的方法。我需要使用分配给该选项的值,但这些选项未按顺序列出。这是一些html。

<option value="9">Aerospace Studies</option>
<option value="200">African American Studies</option>
<option value="10">African Languages</option>
<option value="11">African Studies</option>
<option value="12">Afrikaans</option>
<option value="13">Afro-American Studies</option>
<option value="14">American Indian Studies</option>
<option value="198">American Sign Language</option>
<option value="15">Ancient Near East</option>
<option value="16">Anesthesiology</option>
<option value="17">Anthropology</option>
<option value="19">Applied Linguistics</option>
<option value="20">Arabic</option>
<option value="21">Archaeology</option>
<option value="22">Architecture and Urban Design</option>
<option value="23">Armenian</option>
<option value="24">Art</option>
<option value="25">Art History</option>
<option value="26">Arts and Architecture</option>
<option value="201">Arts Education</option>

1 个答案:

答案 0 :(得分:0)

我不确定是否会收到您的问题,但这是一种收集所有值和所有选项文本并按值对它们进行排序的方法。

import bs4
soup = bs4.BeautifulSoup("""<option value="9">Aerospace Studies</option>
<option value="200">African American Studies</option>
<option value="10">African Languages</option>
<option value="11">African Studies</option>
<option value="12">Afrikaans</option>
<option value="13">Afro-American Studies</option>
<option value="14">American Indian Studies</option>
<option value="198">American Sign Language</option>
<option value="15">Ancient Near East</option>
<option value="16">Anesthesiology</option>
<option value="17">Anthropology</option>
<option value="19">Applied Linguistics</option>
<option value="20">Arabic</option>
<option value="21">Archaeology</option>
<option value="22">Architecture and Urban Design</option>
<option value="23">Armenian</option>
<option value="24">Art</option>
<option value="25">Art History</option>
<option value="26">Arts and Architecture</option>
<option value="201">Arts Education</option>""")
print(sorted(list(map(lambda x: (int(x['value']), x.text), soup.find_all('option')))))

[(9, u'Aerospace Studies'), (10, u'African Languages'), (11, u'African Studies'), (12, u'Afrikaans'), (13, u'Afro-American Studies'), (14, u'American Indian Studies'), (15, u'Ancient Near East'), (16, u'Anesthesiology'), (17, u'Anthropology'), (19, u'Applied Linguistics'), (20, u'Arabic'), (21, u'Archaeology'), (22, u'Architecture and Urban Design'), (23, u'Armenian'), (24, u'Art'), (25, u'Art History'), (26, u'Arts and Architecture'), (198, u'American Sign Language'), (200, u'African American Studies'), (201, u'Arts Education')]

希望这会有所帮助