我很困惑,为什么我无法使用FriendFeed从urllib2下载某些JSON响应的全部内容。
>>> import urllib2
>>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json')
>>> stream.headers['content-length']
'168928'
>>> data = stream.read()
>>> len(data)
61058
>>> # We can see here that I did not retrieve the full JSON
... # given that the stream doesn't end with a closing }
...
>>> data[-40:]
'ce2-003048343a40","name":"Vincent Racani'
如何使用urllib2检索完整响应?
答案 0 :(得分:18)
获取所有数据的最佳方式:
fp = urllib2.urlopen("http://www.example.com/index.cfm")
response = ""
while 1:
data = fp.read()
if not data: # This might need to be if data == "": -- can't remember
break
response += data
print response
原因是,鉴于套接字的性质,.read()
不能保证返回整个响应。我认为这在文档中讨论过(可能urllib
),但我找不到它。
答案 1 :(得分:4)
使用tcpdump(或类似的东西)来监控实际的网络交互 - 然后您可以分析某些客户端库的网站被破坏的原因。确保通过编写测试脚本重复多次,以便查看问题是否一致:
import urllib2
url = 'http://friendfeed.com/api/room/friendfeed-feedback/profile?format=json'
stream = urllib2.urlopen(url)
expected = int(stream.headers['content-length'])
data = stream.read()
datalen = len(data)
print expected, datalen, expected == datalen
该网站一直为我工作,所以我不能举例说明发现失败:)
答案 2 :(得分:2)
继续调用stream.read()直到完成...
while data = stream.read() :
... do stuff with data
答案 3 :(得分:0)
readlines()
也有效