Question

嗨，我得到了上述错误。为什么会弹出，我缺少什么以及如何绕过它？感谢

try:
    import urllib.request as urllib2
except ImportError:
    import urllib2

from html2text import html2text

sock = html2text(urllib2.urlopen('http://www.example.com')) 
htmlSource = sock.read()                            
sock.close()                                        
print (htmlSource)

我在Windows 7操作系统上运行IDLE 3.4.3。

Answer 1

html2text期望将中传递的HTML代码作为字符串 - 阅读回复：

source = urllib2.urlopen('http://www.example.com').read()
text = html2text(source)
print(text)

打印：

# Example Domain

This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.

[More information...](http://www.iana.org/domains/example)

Answer 2

Replace是字符串的一个属性，你有一个文件对象

obj=urllib2.urlopen('http://www.example.com')
print obj

<addinfourl at 3066852812L whose fp = <socket._fileobject object at 0xb6d267ec>>

这个没问题。

#!/usr/bin/python

try:
    import urllib.request as urllib2
except ImportError:
    import urllib2

from html2text import html2text


source=urllib2.urlopen('http://www.example.com').read() 
s=html2text(source)

print s

输出

This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.

[More information...](http://www.iana.org/domains/example

Answer 3

我想我找到了Python 3.4的解决方案。我刚刚将源解码为UTF-8并且它有效。

#!/usr/bin/python

try:
    import urllib.request as urllib2
except ImportError:
    import urllib2

from html2text import html2text

source=urllib2.urlopen('http://www.example.com').read() 
s=html2text(source.decode("UTF-8"))

print (s)

输出

# Example Domain

This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.

[More information...](http://www.iana.org/domains/example)

AttributeError：'HTTPResponse'对象没有属性'replace'

3 个答案: