我正在编写一个脚本来下载gzipped XML站点地图;文件下载,但它们已损坏。脚本输出的gzip文件比它们应该大一点,并且解压缩的文件比它们应该小,因为数据丢失了。知道我做错了吗?
saveAddress = "test.xml.gz"
import urllib2
import httplib
from urllib2 import Request, urlopen, URLError
try:
request = urllib2.Request("http://example.com/sitemap-general.xml.gz")
request.add_header('Accept-encoding', 'gzip')
request.add_header('User-agent', 'Custom UA String')
opener = urllib2.build_opener()
try:
pageText = opener.open(request).read()
open(saveAddress, "w").write(pageText)
print "Crawled successfully."
except URLError, e:
pass
except URLError, e:
pass
感谢您的帮助,非常感谢。
答案 0 :(得分:6)
以二进制模式打开文件:
open(saveAddress, "wb").write(pageText)