Question

我们说我的链接是这样的：

link = '<a href="some text">...</a>'

有什么方法可以从anchor href属性中检索文本，所以结果会是这样的：

hrefText = 'some text'

提前谢谢你

Answer 1

这是一种方式：

import re
print re.search('(?<=<a href=")[^"]+',link).group(0)

或者，

print re.search(r'<a\s+href="([^"]+)',link).group(1)

Answer 2

虽然您可以拆分或使用正则表达式，但对于更多模块化和强大的工具集，您可以使用

BeautifulSoup：https://www.crummy.com/software/BeautifulSoup/

示例代码：

from bs4 import BeautifulSoup 
link = '<a href="some text">...</a>'
soup = BeautifulSoup(link, "html.parser")
for anchor in soup.find_all('a', href=True):
    print anchor['href']

或者，对于单个功能，您可以这样做：

from bs4 import BeautifulSoup 

def getHref( link ):
    soup = BeautifulSoup(link, "html.parser")
    return soup.find_all('a', href=True)[0]['href']

Answer 3

您可以使用bs4并为此请求lib。

SequenceType

希望这会有所帮助：）

如何从python中的anchor href属性中检索文本

3 个答案: