Question

我正在浏览供应商链接目录。我创造了一个汤＆amp;使用find_all方法隔离了我想要的所有数据。然而，我需要的字符串在汤中进一步嵌套。我知道find_all会返回一个列表但我需要进一步提炼列表以获得我需要的东西。感谢您的帮助，因为我要将笔记本电脑放在房间里。以下是我目前的代码。

我对编码世界很陌生，对Python有一个很好的理解，但只是对Beautiful Soup的基本理解。

URL = get(https://www......) # importing the url I want to work over
soup = BeautifulSoup(URL.text, 'html.parser') # making the soup
IsoUrl = soup.find_all('a',class='xmd-listing-company-name') # Isolates the tags of the links I need.

这或多或少会让我陷入困境。从上面的隔离我得到一个由以下组成的列表。以下只是列表中的一项。

<a class="xmd-listing-company-name"href="/rated.company.html" itemprop='url><span itemprop='name'>Company</span></a>'

列表中有10多个上述字符串。我想从每个字符串＆amp;中删除'/rated.company.html'将它们附加到列表中以进行迭代。

任何指导都非常感谢。如果我需要澄清任何事情，请告诉我

Answer 1

你可以简单地循环find_all的结果并提取href，如下所示：

results = [iso['href'] for iso in IsoUrl]

# >>> ["/rated.company.html", ...]

BeautifulSoup在find_all之后得到href

1 个答案: