我正在尝试从网址中提取数据,但在写入文件时我收到此错误,因为text
不为空。
我的代码:
def gettextonly(self, url):
url = url
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# kill all script and style elements
for script in soup(["script", "style","a","<div id=\"bottom\" >"]):
script.extract() # rip it out
text = soup.findAll(text=True)
#print text
fo = open('foo.txt', 'w')
fo.seek(0, 2)
if text:
line =fo.writelines(text.encode('utf8'))
fo.close()
错误:
in gettextonly
line =fo.writelines(text.encode('utf8'))
AttributeError: 'ResultSet' object has no attribute 'encode'
答案 0 :(得分:5)
soup.findAll(text=True)
返回一个ResultSet
对象,该对象基本上是一个没有属性encode
的列表。您要么使用.text
代替:
text = soup.text
或者,“加入”文本:
text = "".join(soup.findAll(text=True))