我有一个我从互联网上删除的字符串列表,我想提取他们的'href':
<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>
例如,我希望遍历列表并动态提取
/红葡萄酒
这
<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>
谢谢!
答案 0 :(得分:1)
您可以使用lxml
。像这样:
from lxml import html
import request
response = request.get('<your url>')
tree = html.fromstring(response.text)
href = tree.xpath('//a[@class="subnav__item"]/@href')
这可以从课程href
"subnav__item"
答案 1 :(得分:1)
您还可以使用Beautiful Soup获取所需的文字:
from bs4 import *
data = '\
<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>'
soup = BeautifulSoup(data, "html.parser")
lis = soup.findAll('a')
for li in lis:
print(li['href'])
/red-wine /white-wine /rose-wine /fine-wine