Question

我正在尝试编写一个网络服务器（代理服务器），以便可以向http://localhost:8080/foo/bar发出请求，这样可以透明地返回来自https://www.gyford.com/foo/bar的响应。

以下python脚本本身可用于网页，但不会返回某些文件（例如https://www.gyford.com/static/hines/js/site-340675b4c7.min.js）。如果我在该服务器运行时手动请求该文件，例如：

import requests
r = requests.get('http://localhost:8080/static/hines/js/site-340675b4c7.min.js')

然后我得到：

“已收到内容编码为gzip的响应，但未能对其进行解码。”，错误（“解压缩数据时出现错误-3：标头检查不正确”，）

所以我想我需要以不同的方式处理gzip压缩文件，但是我不知道如何处理。

from http.server import HTTPServer, BaseHTTPRequestHandler
import requests

HOST_NAME = 'localhost'
PORT_NUMBER = 8080
TARGET_DOMAIN = 'www.gyford.com'

class MyHandler(BaseHTTPRequestHandler):

    def do_GET(self):
        host_domain = '{}:{}'.format(HOST_NAME, PORT_NUMBER)

        host = self.headers.get('Host').replace(host_domain, TARGET_DOMAIN)

        url = ''.join(['https://', host, self.path])

        r = requests.get(url)

        self.send_response(r.status_code)

        for k,v in r.headers.items():
            self.send_header(k, v)

        self.end_headers()

        self.wfile.write( bytes(r.text, 'UTF-8') )

if __name__ == '__main__':
    server_class = HTTPServer
    httpd = server_class((HOST_NAME, PORT_NUMBER), MyHandler)
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass
    httpd.server_close()

编辑：这是print(r.headers)的输出：

{'Connection'：'keep-alive'，'Server'：'gunicorn / 19.7.1'，'Date'：'Wed，26 Sep 2018 13:43:43 GMT'，'Content-Type'： '应用程序/ javascript; charset =“ utf-8”'，'Cache-Control'：'max-age = 60，public'，'Access-Control-Allow-Origin'：'*'，'Vary'：'Accept-Encoding'，' Last-Modified'：'Thu，20 Sep 2018 16:11:29 GMT'，'Etag'：'“ 5ba3c6b1-6be”'，'Content-Length'：'771'，'Content-Encoding'：'gzip' ，'Via'：'1.1 vegur'}

Answer 1

问题：我需要以不同方式处理压缩文件。

我想知道，这对于网页本身如何起作用，但是假设有一些神奇的浏览器处理方式。

您在做什么：
    r = requests.get(url)
您将获得url内容，自动解码gzip并缩小传输编码。
    self.wfile.write( bytes(r.text, 'UTF-8') )
您，编写已解码的r.text，编码为bytes，这与传输编码不同。

更改以下内容：
作为原始字节流读写 –它不会转换响应内容。
您也可以将其用于其他数据，例如“ html”请求。

    r = requests.get(url, stream=True)
    ...
    self.wfile.write(r.raw.read())

来自docs.python-requests.org的注释：
另请阅读有关原始响应内容的章节。
如果要流式传输非常大的数据，则在阅读时必须 chunk 。

注意：这是默认标头，python-requests正在使用。
已经有'Accept-Encoding'：'gzip，deflate'标头，因此在客户端不需要执行任何操作。
{'headers': {'Accept': '*/*', 
 'User-Agent': 'python-requests/2.11.1', 
 'Accept-Encoding': 'gzip, deflate', 
 'Connection': 'close', 
 'Host': 'httpbin.org'}
}

使用python请求设置代理

1 个答案: