AttributeError:'HTTPResponse'对象没有属性'replace'

时间:2015-06-20 02:48:40

标签: python python-3.x httpresponse scrape

嗨,我得到了上述错误。为什么会弹出,我缺少什么以及如何绕过它?感谢

try:
    import urllib.request as urllib2
except ImportError:
    import urllib2

from html2text import html2text

sock = html2text(urllib2.urlopen('http://www.example.com')) 
htmlSource = sock.read()                            
sock.close()                                        
print (htmlSource)

我在Windows 7操作系统上运行IDLE 3.4.3。

3 个答案:

答案 0 :(得分:2)

html2text期望将中传递的HTML代码作为字符串 - 阅读回复:

source = urllib2.urlopen('http://www.example.com').read()
text = html2text(source)
print(text)

打印:

# Example Domain

This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.

[More information...](http://www.iana.org/domains/example)

答案 1 :(得分:0)

Replace是字符串的一个属性,你有一个文件对象

obj=urllib2.urlopen('http://www.example.com')
print obj

<addinfourl at 3066852812L whose fp = <socket._fileobject object at 0xb6d267ec>>

这个没问题。

#!/usr/bin/python

try:
    import urllib.request as urllib2
except ImportError:
    import urllib2

from html2text import html2text


source=urllib2.urlopen('http://www.example.com').read() 
s=html2text(source)

print s

输出

This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.

[More information...](http://www.iana.org/domains/example

答案 2 :(得分:0)

我想我找到了Python 3.4的解决方案。我刚刚将源解码为UTF-8并且它有效。

#!/usr/bin/python

try:
    import urllib.request as urllib2
except ImportError:
    import urllib2

from html2text import html2text

source=urllib2.urlopen('http://www.example.com').read() 
s=html2text(source.decode("UTF-8"))

print (s)

输出

# Example Domain

This domain is established to be used for illustrative examples in documents.
You may use this domain in examples without prior coordination or asking for
permission.

[More information...](http://www.iana.org/domains/example)