<tbody>
<tr class="abc bg1">...</tr>
<tr class="bg1">...</tr>
<td> class="no">...</td>
<td>sampletext</td>
<td> class="title">...</td>
<tr class="bg2">...</tr>
此示例代码有3个类'abc bg1','bg1','bg2'
我只想要'bg1','bg2'标签
所以我使用了soup.select('tbody > tr.bg1 > td')
此代码导致'abc bg1','bg1'标记儿童'td' 我如何得到我想要的结果? 对于'bg1',我想只提取除其他标签之外的文本 前): sampletext&lt; - only
答案 0 :(得分:0)
from bs4 import BeautifulSoup
html_str = """<tbody>
<tr class="abc bg1">...</tr>
<tr class="bg1">...</tr>
<td> class="no">...</td>
<td>sampletext</td>
<td> class="title">...</td>
<tr class="bg2">...</tr><tobdy>"""
soup = BeautifulSoup(html_str)
bg1 = soup.findAll('tr', attrs= {'class':'bg1'})[1].text
如果您使用.findAll,它会找到具有该类名的所有attrs。它给你一个数组;然后只需为你想要的tr调用数组索引。
<强>更新强>
如果你想要bg1里面的元素;打电话给另一个.find。像这样:
sample_text = soup.findAll('td')[1].text
#这会为您提供&#34;示例文字&#34;。
答案 1 :(得分:0)
这是识别所有具有'bg1'OR'bg2'但不是'abc'的标签的方法:
from bs4 import BeautifulSoup
html_doc = '''<tbody>
<tr class="abc bg1">...</tr>
<tr class="bg1">...</tr>
<td> class="no">...</td>
<td>sampletext</td>
<td> class="title">...</td>
<tr class="bg2">...</tr>
</tbody>'''
soup = BeautifulSoup(html_doc, html.parser)
# We can look for all tags that are "tr" tags.
for tag in soup.find_all('tr'):
# Each tag has attributes. We can reference the attrs dictionary
# using the attribute name as the key.
if 'abc' in tag.attrs['class']:
continue
else:
print(tag)
<tr class="bg1">...</tr>
<tr class="bg2">...</tr>