嗨,我得到了上述错误。为什么会弹出,我缺少什么以及如何绕过它?感谢
try:
import urllib.request as urllib2
except ImportError:
import urllib2
from html2text import html2text
sock = html2text(urllib2.urlopen('http://www.example.com'))
htmlSource = sock.read()
sock.close()
print (htmlSource)
我在Windows 7操作系统上运行IDLE 3.4.3。
答案 0 :(得分:2)
html2text
期望将中传递的HTML代码作为字符串 - 阅读回复:
source = urllib2.urlopen('http://www.example.com').read()
text = html2text(source)
print(text)
打印:
# Example Domain
This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.
[More information...](http://www.iana.org/domains/example)
答案 1 :(得分:0)
Replace是字符串的一个属性,你有一个文件对象
obj=urllib2.urlopen('http://www.example.com')
print obj
<addinfourl at 3066852812L whose fp = <socket._fileobject object at 0xb6d267ec>>
这个没问题。
#!/usr/bin/python
try:
import urllib.request as urllib2
except ImportError:
import urllib2
from html2text import html2text
source=urllib2.urlopen('http://www.example.com').read()
s=html2text(source)
print s
输出
This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.
[More information...](http://www.iana.org/domains/example
答案 2 :(得分:0)
我想我找到了Python 3.4的解决方案。我刚刚将源解码为UTF-8并且它有效。
#!/usr/bin/python
try:
import urllib.request as urllib2
except ImportError:
import urllib2
from html2text import html2text
source=urllib2.urlopen('http://www.example.com').read()
s=html2text(source.decode("UTF-8"))
print (s)
输出
# Example Domain
This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.
[More information...](http://www.iana.org/domains/example)