Question

当文本中有<br>时，为什么这不起作用？我得到一个空文本。

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
address = 'http://www.bbc.com'
response = opener.open(address)
html = response.read()
soup = BeautifulSoup(html)
snaptext = soup.find('p', attrs={'class': 'displaytext'})
print snaptext.string

一个例子是：

< p > blahblahblah< br/ >blah2blah2blah2< br/ >< p >

如果文本中有< br >，则结果为无

Answer 1

正如你在这里看到的那样，br不是问题，它是你使用的.string，它总是返回None，因为它没有属性.string。您可能想要使用.getText()

>>> x = bs.find('div', attrs={'id': 'forum-post-body-183'})
>>> x
<div class="j-comment-body forum-post-body u-typography-format text" id="forum-post-body-183" itemprop="text">
<p>Let's try it! I will only replace Sir Finley with Ysera for late game pressing (and ev. win condition).<br>In edit of this comment i would report about results in casual battles (for start).</br></p>
</div>
>>> x.string
>>> print(x.string)
None
>>> x.getText()
"\nLet's try it! I will only replace Sir Finley with Ysera for late game pressing (and ev. win condition).In edit of this comment i would report about results in casual battles (for start).\n"

BeautifulSoup中没有文字

1 个答案: