Question

我遇到了一个问题，试图找到一个具有特定标题的href，然后仅提取href。

我有下面的代码，但似乎无法根据在href中找到的超链接文本使其仅获得href（无论是在open和close之间）。

res = requests.get(website_url)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
temp_tag_href = soup.select_one("a[href*=some text]")
sometexthrefonly = temp_tag_href.attrs['href']

实际上，我希望它能够通过在汤中解析的整个html，并且只返回href open和“close”之间的内容，因为超链接文本是“一些文本”。

所以步骤将是：

1: parse html, 
2: look at all the a hrefs tags, 
3: find the href that has the hyperlink text 'some text', 
4: output only what is in between the href " " (not including the 
   "") for that href

非常感谢任何帮助！

Answer 1

艾哈迈德

因此，在对请求进行快速复习并研究BeautifulSoup库之后，我想您会想要以下内容：

res = requests.get(website_url)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
link = list(filter(lambda x: x['href'] == 'some text', soup.find_all('a')))[0]
print(link['href']) # since you don't specify output to where, I'll use stdout for simplicity

事实证明，在Beautiful Soup Documentation中，有一种方便的方法可以使用字典查找语法从html元素访问您想要的任何属性。您也可以使用此库进行各种查找。

如果您正在进行网页抓取，尝试切换到支持XPATH的库也可能很有用，它允许您编写强大的查询，例如//a[@href="some text"][1]，它将为您提供第一个链接，其中url等于“一些文字”

Answer 2

这应该做的工作：

this.DisplayError(data.error);

美丽的汤基于超链接Text发现href

2 个答案: