Question

<li class="actualPrice price fakeLink " data-automation="actual-price">
       <span class="visuallyhidden">Hello world</span>
Some text I want to extract
</li>

这是一些HTML。我想提取文本＆＃34;我要提取的一些文字＆＃34;，我不想提取Hello世界。

我尝试了类似find（＆＃39; span＆＃39;）并使用next_sibling但我没有。

for a in soup.find_all('li', 'actualPrice'):
        print a.get_text()

这给了我Hello world和＃34;我要提取的一些文字＆＃34;。是否有任何方法可以提取＆＃34;我想提取的一些文字＆＃34;仅？

Answer 1

如果您想在span代码后提取下一个元素，那么您可以使用.next：

>>> for a in soup.find_all('li', 'actualPrice'):
        print(a.span.next.next)
Some text I want to extract

Answer 2

只是为了另一种方法，您可以使用stripped_strings：

for li in soup.find_all('li', 'actualPrice'):
    _, text_you_want = li.stripped_strings
    print (text_you_want)

输出：

我要提取的一些文字

美丽的汤4：提取没有标签的文字

2 个答案: