Question

在BeautifulSoup文档中，我发现要删除的标签的信息使用了'decompose()'，但是每当我尝试将其应用于我的情况时，我总是会得到相同的结果：

<bound method Tag.decompose of <strong>1 L</strong>>

我的目标是只获得'3,78zł/ l'。如何使用这种方法获得正确的结果？

我的文件。py：

from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests


url = "https://www.auchandirect.pl/auchan-warszawa/pl/pepsi-cola-max-niskokaloryczny-napoj-gazowany-o-smaku-cola/p-98502176"
r = requests.get(url, headers={'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'}, timeout=15)
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
type(soup)



products_links_price = soup.find(class_='packaging')


print(products_links_price.strong.decompose)

结果：

<bound method Tag.decompose of <strong>1 L</strong>>

当我尝试以另一种方式做到这一点时（在“ strong tag”中得到一个单词一切正常）。

print(products_links_price.strong.text)

结果_1

'1 L'

Answer 1

要仅获得'3,78zł / l'作为结果，请将print(products_links_price.strong.decompose)替换为：

products_links_price.strong.decompose()
print(products_links_price.text.strip())

这将输出：

3,78zł / l

每当您尝试打印方法或函数调用的结果并得到None时，都应问自己该方法或函数是否指定了返回值。如果不是，则默认情况下将返回None，与decompose()一样，因为它所做的只是recursively destroy the Tag and remove it from the Tag tree：

def decompose(self):
    """Recursively destroys the contents of this tree."""
    self.extract()
    i = self
    while i is not None:
        next = i.next_element
        i.__dict__.clear()
        i.contents = []
        i = next

Answer 2

显然，decompose()方法不会根据BeautifulSoup文档返回任何内容。您必须选择包含p元素的strong元素。之后，从所选元素中分解strong标签。现在strong标记的组成部分已经消失了，您可以提取所需的文本。

pack = soup.find('p', class_='packaging')
pack.strong.decompose()
print(pack.text) # this will return the desired output 3,78zł / l

希望这会有所帮助！干杯!

decompose（）始终返回“ <strong> </strong>的绑定方法Tag.decompose”。 Python，BeautifulSoup

2 个答案: