Question

我有看起来像这样（汤）的html代码：

<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display">
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display">
<label> Somete text here </label>
<span class="type999 type999-display">
<span class="type1 type1-display">

我必须从页面上同时获取标签和跨度，但是要有多个搜索参数。

对于标签，我只需要抓住那些包含for =的标签（其中的任何文本）
对于跨度，我只需要抓住列表中包含单词的那些，例如

myList = ['type1'，'type2'，'type3']

必须遵守页面上的顺序

我需要的结果如下：

<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display">
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display">
<span class="type1 type1-display">

要查找在“ for =“之后包含任何内容的标签，我使用以下代码：

soup.find_all('label', {'for': re.compile('.*')}) # it works as expected

但是现在我还需要找到具有特定措辞的所有跨度，并遵守网页上的顺序。

我尝试过，但是没有用：

soup.find_all(['label', 'span'], [{'for': re.compile('.*')}, {'class': 'type1'}], recursive=False) # here i just used {'class': 'type1'} becase I don't know how to pass in a list to soup to search for a match)

提前谢谢！

edit：我还尝试将2个find_all（）搜索与（+）组合在一起，但是随后我放宽了顺序。 edit2：拼写

Answer 1

您也可以不使用正则表达式。

from bs4 import BeautifulSoup
data='''<label for="02" class="highlited">"Some text here"</label>
<span class="type3 type3-display"></span>
<label for="01" class="highlited">"Some text here"</label>
<span class="type1 type1-display"></span>
<label> Somete text here </label>
<span class="type999 type999-display"></span>
<span class="type1 type1-display"></span>'''

myList = ['type1', 'type2', 'type3']
soup=BeautifulSoup(data,'html.parser')

for item in soup.find_all():
    if (item.name=='label') and 'for' in item.attrs :
       print(item)
    if (item.name == 'span') and item['class'][0] in myList :
        print(item)

输出：

<label class="highlited" for="02">"Some text here"</label>
<span class="type3 type3-display"></span>
<label class="highlited" for="01">"Some text here"</label>
<span class="type1 type1-display"></span>
<span class="type1 type1-display"></span>

使用Beautifulsoup查找具有多个搜索参数的多个标签

1 个答案: