从动态类=“?”获取数据来自bs4

时间:2018-09-04 16:32:52

标签: python beautifulsoup

我可以使用动态类,即class =“ cfBD1”和class =“ cfJLC”还是class =“ ????”从标签中获取=“数据”?

from bs4 import BeautifulSoup

soup=BeautifulSoup("""<div class="couponTable"><div id="tgCou1" class="tgCoupon couponRow"><span class="spBtnMinus"></span><!-- react-text: 67 -->Wednesday Matches<!-- /react-text --></div><div class="couponRow rAlt1 tgCou1" id="rmid20180905WED1"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"><img src="/ContentServer/jcbw/images/icon_tv-C661.gif?CV=L302R1g" alt="C661-i-CABLE 661 C601-i-CABLE 601" title="C661-i-CABLE 661 C601-i-CABLE 601"></span></span><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div><div class="couponRow rAlt0 tgCou1" id="rmid20180905WED2"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"><img src="/ContentServer/jcbw/images/icon_tv-C662.gif?CV=L302R1g" alt="C662-i-CABLE 662 C602-i-CABLE 602" title="C662-i-CABLE 662 C602-i-CABLE 602"></span></span><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div></div></div><div class="couponRow rAlt1 tgCou1" id="rmid20180905WED12"><img src="/ContentServer/jcbw/images/flag_BD1.gif?CV=L302R1g" alt="Brazilian Division 1" title="Brazilian Division 1" class="cfBD1"><img src="/football/info/images/btn_odds.gif?CV=L302R1g" alt="All Odds" title="All Odds"></a></div></div>""",'html.parser')

lines=soup.find_all('img')
for line in lines:
    print(line['alt'])

输出:

Japanese League Cup
C661-i-CABLE 661 C601-i-CABLE 601
All Odds
Japanese League Cup
C662-i-CABLE 662 C602-i-CABLE 602
All Odds
Brazilian Division 1
All Odds

预期输出:

Japanese League Cup
Japanese League Cup
Brazilian Division 1

1 个答案:

答案 0 :(得分:1)

在这种情况下,您可以仅检查img标签是否具有class属性:

soup.find_all('img', attrs={'class': True})

示例:

In [1570]: [img['alt'] for img in soup.find_all('img', attrs={'class': True})]
Out[1570]: ['Japanese League Cup', 'Japanese League Cup', 'Brazilian Division 1']

为完整起见,匹配任何动态属性值,您需要在命名中找到一个通用模式,例如在这种情况下,似乎所有的类名都以字符c开头;因此,您可以使用CSS选择器:

img[class^="c"]

示例:

In [1571]: [img['alt'] for img in soup.select('img[class^="c"]')]
Out[1571]: ['Japanese League Cup', 'Japanese League Cup', 'Brazilian Division 1']