Question

这是我的代码：

html = '''
<td class="ClassName class" width="60%">Data I want to extract<span lang=EN- 
UK style="font-size:12pt;font-family:'arial'"></span></td>
'''


soup = BeautifulSoup(html, 'html.parser')

print(soup.select_one('td').string)

它返回None。我认为这与空的span标签有关。我认为它进入了span标签，并返回了那些内容？因此，我要么删除该span标签，要么在找到“我要提取的数据”后立即停止，或者告诉它忽略空标签

如果'td'内没有空标签，则它实际上可以工作。

是否有一种方法通常可以忽略空标签并向后退一步？而不是忽略此特定的span标签？

抱歉，这太基础了，但是我花了很多时间进行搜索。

Answer 1

使用.text属性，而不是.string：

html = '''
<td class="ClassName class" width="60%">Data I want to extract<span lang=EN-
UK style="font-size:12pt;font-family:'arial'"></span></td>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

print(soup.select_one('td').text)

输出：

我要提取的数据

Answer 2

使用.text：

>>> soup.find('td').text
u'Data I want to extract'

Python-使用BS4从此Html标签提取数据，而不是获取None

2 个答案: