以下HTML代码:
<div class="rating-list">
<ul class="recommend">
<li>
<span class="recommend-titleInline">Stayed April 2013, traveled as a couple</span>
<ul class="recommend-column first">
<li class="recommend-answer">
<span class="rate rate_ss ss50">
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/>
</span>
Value</li>
<li class="recommend-answer">
<span class="rate rate_ss ss50">
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/>
</span>
Location</li>
<li class="recommend-answer">
<span class="rate rate_ss ss50">
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/>
</span>
Sleep Quality</li>
</ul>
<ul class="recommend-column">
<li class="recommend-answer">
<span class="rate rate_ss ss50">
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/>
</span>
Rooms</li>
<li class="recommend-answer">
<span class="rate rate_ss ss50">
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/>
</span>
Cleanliness</li>
<li class="recommend-answer">
<span class="rate rate_ss ss50">
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/>
</span>
Service</li>
</ul>
</li>
</ul>
</div>
现在我使用Beautifulsoup获取整个标签,然后我想得到像这样的“li”标签:
valueRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Value')
locationRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Location')
sleepRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Sleep Quality')
roomRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Rooms')
cleanRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Cleanliness')
serviceRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Service')
但似乎失败了。六个变量都是无,这不是我所期望的。我应该怎么做
答案 0 :(得分:0)
使用正则表达式作为text
帮助的参数吗?
subRatingListTags[i].find(text=re.compile("Location"))
换行符可能导致完全文本匹配失败。
答案 1 :(得分:0)
你不清楚你想要什么。无论如何:
>>> lis = [t for t in soup.find_all('li', 'recommend-answer')]
>>> lis[0].text
'\n\n\n\nValue'
>>> lis[1].text
'\n\n\n\nLocation'
>>> lis[0].img['alt']
'5 of 5 stars'
您肯定希望在开始解析之前预先处理html以删除所有换行符。