无法获取HTML标记“ alt” =
中的数据from bs4 import BeautifulSoup
import re
soup=BeautifulSoup("""<div class="couponTable">
<div id="tgCou1" class="tgCoupon couponRow"><span class="spBtnMinus"></span><!-- react-text: 67 -->Wednesday Matches<!-- /react-text --></div>
<div class="cflag"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"></div>
<div class="cflag"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"></div>
</div></div></div>""")
lines=soup.find_all('div')
line in lines:print(re.findall('\w+',line['alt'])[0])
答案 0 :(得分:1)
如果只需要alt
值,那么最好使用img
标签而不是div
标签。同样,也无需使用正则表达式来提取alt
值
from bs4 import BeautifulSoup
import re
soup=BeautifulSoup("""<div class="couponTable">
<div id="tgCou1" class="tgCoupon couponRow"><span class="spBtnMinus"></span><!-- react-text: 67 -->Wednesday Matches<!-- /react-text --></div>
<div class="cflag"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"></div>
<div class="cflag"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"></div>
</div></div></div>""",'html.parser')
lines=soup.find_all('img')
for line in lines:
print(line['alt'])
输出
日本联赛杯
日本联赛杯