无法获取HTML标记内的数据

时间:2018-09-04 07:27:42

标签: python regex beautifulsoup

无法获取HTML标记“ alt” =

中的数据
from bs4 import BeautifulSoup
import re
soup=BeautifulSoup("""<div class="couponTable">
    <div id="tgCou1" class="tgCoupon couponRow"><span class="spBtnMinus"></span><!-- react-text: 67 -->Wednesday Matches<!-- /react-text --></div>
    <div class="cflag"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"></div>
    <div class="cflag"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"></div>
    </div></div></div>""")

lines=soup.find_all('div')
line in lines:print(re.findall('\w+',line['alt'])[0])

1 个答案:

答案 0 :(得分:1)

如果只需要alt值,那么最好使用img标签而不是div标签。同样,也无需使用正则表达式来提取alt

from bs4 import BeautifulSoup
import re
soup=BeautifulSoup("""<div class="couponTable">
<div id="tgCou1" class="tgCoupon couponRow"><span class="spBtnMinus"></span><!-- react-text: 67 -->Wednesday Matches<!-- /react-text --></div>
<div class="cflag"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"></div>
<div class="cflag"><img src="/ContentServer/jcbw/images/flag_JLC.gif?CV=L302R1g" alt="Japanese League Cup" title="Japanese League Cup" class="cfJLC"></div>
</div></div></div>""",'html.parser')

lines=soup.find_all('img')
for line in lines:
    print(line['alt'])

输出

  

日本联赛杯
  日本联赛杯