Question

您好我正在使用Python3创建一个应用程序，该应用程序从给定的URL返回没有HTML标记的网站文本，只是简洁明了的文本。

这是我的代码应该工作但不是：

import urllib, formatter, sys
from urllib.request import urlopen
from html.parser import HTMLParser

website = urlopen("http://www.google.com")
data = website.read()
website.close()

format = formatter.AbstractFormatter(formatter.DumbWriter(sys.stdout))
ptext = HTMLParser(format)
ptext.feed(data)
ptext.close()

错误：

  File "app.py", line 11, in <module>
    ptext.feed(data)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/html/parser.py", line 144, in feed
    self.rawdata = self.rawdata + data
TypeError: Can't convert 'bytes' object to str implicitly

我找到了一个克服错误的解决方案，但我没有得到适当的结果。解决方案是更改以下行：

ptext.feed（数据）

到：

ptext.feed（data.decode（＆＃34; UTF-8＆＃34））

现在的问题是程序运行终端没有结果但是没有结果，代码在我看到的教程上进行测试并且有效。

感谢。

HTMLParser出错并在Python中输入

0 个答案: