美丽的汤-获取包含字符串的参数属性

时间:2018-11-23 05:45:42

标签: python html beautifulsoup

假设我们有一个html,如下所示:

<span title="Sports Football">Football</span>
<span title="Sports Badminton">Tennis</span>
<span title="Sports Ski Jump">Ski Jump</span>

如果它包含title,我想提取Sports属性上的参数:

所以最后我们有了一个变量sports

sports = ['Football', 'Badminton', 'Ski Jump']

这就是我用的:

sports = soup.find_all('span', {'title': 'Sports'})

我什么都没有

3 个答案:

答案 0 :(得分:1)

如果$username = "domain\administrator" $password = "Your password" $credential = New-Object System.Management.Automation.PSCredential -ArgumentList $username, $password $computers = Get-ADComputer -Filter * foreach($computer in $computers){ $computerDNS = $computer.DNSHostName $hotspot = Invoke-Command -ComputerName $computerDNS -credential $credential -scriptblock { $hotspot = Get-Service "icssvc" if($hotspot.Status -eq "Running"){ Write-Host "Hotspot is turned on on $env:computername" -ForegroundColor Red try{ Start-Service "icssvc" Write-Host "Successfully stopped service on $env:computername" -ForegroundColor Green }catch{ Write-Host "Unable to stop service on $env:computername" -ForegroundColor Red } }else{ Write-Host "No Hotspot running on $env:computername" -ForegroundColor Green } } } 属性的第一部分是re.compile,则可以将BeautifulSoupspan一起使用来查找所有title标签:

"Sports"

输出:

content = """
 <span title="Sports Football">Football</span>
 <span title="Sports Badminton">Tennis</span>
 <span title="Sports Ski Jump">Ski Jump</span>
"""

import re
from bs4 import BeautifulSoup as soup
d = soup(content, 'html.parser')
results = [i.text for i in d.find_all('span', {'title':re.compile('^Sports\s')})]

答案 1 :(得分:0)

您一无所获,因为没有固定的标题仅命名为Sports,而且它的工作方式不像通配符。如果要获取title的属性值,可以在使用get(attr_name)获得的标记对象上使用find_all

from bs4 import BeautifulSoup

html = '''<span title="Sports Football">Football</span>
<span title="Sports Badminton">Tennis</span>
<span title="Sports Ski Jump">Ski Jump</span>'''

soup = BeautifulSoup(html,"lxml")

title = [s.get('title') for s in soup.find_all('span')]
title
>> ['Sports Football', 'Sports Badminton', 'Sports Ski Jump']

除此之外,如果只需要该元素的文本,则只需对.text中标记对象使用find_all方法即可。

sports = [s.text for s in soup.find_all('span')]
sports
>>['Football', 'Tennis', 'Ski Jump']

答案 2 :(得分:-1)

也许您给出的示例只是在您的头顶上编成的,但是您的跨度内容完全符合您要查找的内容-因此在该示例中,您可以通过以下方法解决: sports = soup.find_all('span', {'title': 'Sports'}).contents 这将为您提供所需的字符串版本。