我正在使用美味的汤。我的代码是:
from bs4 import BeautifulSoup
web_address = ('xxxx') # this part is fine I don't want to provide website.
req = urllib2.Request(web_address)
page = urllib2.urlopen(req)
content = page.read()
soup = BeautifulSoup(content)
td = soup.findAll('td')
for line in td:
print(line.get_text())
我正在查看的HTML部分是:
<td class="border_TopRight border_Left">
Text - "TEST_NAME
<td class="border_TopRight">
Text - TEST_NAME_1
<td class="border_TopRight">
Text - TEST_NAME_2
<td class="apple dataCell border_TopRight font_green" id="Number of Apples" style="color: #333333; background-color: rgb(255, 255, 255);" rel="Apples ">
Text - 999999.999
我的python脚本输出是:
TEST_NAME
TEST_NAME_1
TEST_NAME_2
-
我无法弄清楚为什么最后一个输出在&#39; - &#39;。我已经阅读了BS4文档并且似乎无法找出为什么前3个文本带有正确的文本,但最后一个是&#39; - &#39;