如何从Python中的字符串列表中提取动态子字符串?

时间:2018-01-17 07:29:32

标签: python html loops

我有一个我从互联网上删除的字符串列表,我想提取他们的'href':

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>

例如,我希望遍历列表并动态提取

  

/红葡萄酒

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>

谢谢!

2 个答案:

答案 0 :(得分:1)

您可以使用lxml。像这样:

from lxml import html
import request

response = request.get('<your url>')
tree = html.fromstring(response.text)
href = tree.xpath('//a[@class="subnav__item"]/@href')

这可以从课程href

中获取所有"subnav__item"

答案 1 :(得分:1)

您还可以使用Beautiful Soup获取所需的文字:

from bs4 import *
data = '\
<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>'
soup = BeautifulSoup(data, "html.parser")

lis = soup.findAll('a')
for li in lis:
    print(li['href'])
/red-wine
/white-wine
/rose-wine
/fine-wine