我希望输出为“印地语”,“英语”。我能够得到“印地语”,但我在输出“英语”方面遇到了困难
输入:
<td class="_480u">
<div class="clearfix">
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and
<a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td>
我试过的代码:
>>> details.find('a',{'class':''}).string
u'Hindi'
s = details.findAll('a',{'class':''})
s1 = len(s)
list2 = []
if s1 >= 1:
for j in range(0,s1):
lang = s[j].find('a',{'class':''}).string.strip()
list2.append(lang)
Traceback (most recent call last):
File "<pyshell#220>", line 9, in <module>
lang = s[j].find('a',{'class':''}).string.strip()
AttributeError: 'NoneType' object has no attribute 'string'
>>> s
[<a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a>, <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a>]
答案 0 :(得分:1)
如果这是确切的HTML,则不会更改,您可以使用:
from bs4 import BeautifulSoup
html = '<td class="_480u">\
<div class="clearfix">\
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and \
<a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td>'
soup = BeautifulSoup(html)
print soup.find('a',{'class':''}).string
print soup.find('a',{'class':''}).nextSibling.nextSibling.string
输出:
Hindi
English
或者你可以这样做(如果你只使用你在问题中发布的HTML):
from bs4 import BeautifulSoup
html = '<td class="_480u">\
<div class="clearfix">\
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and \
<a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td>'
soup = BeautifulSoup(html)
lang = soup.findAll('a', href = True)
for i in lang:
print i.string
输出:
Hindi
English