Question

我试图在python中创建一个Web代理，它能够从主服务器获取文本而不是图像。网址http://gaia.cs.umass.edu/wireshark-labs/HTTP-wireshark-file1.html包含我可以在浏览器中查看的一行文字，网址http://images.mid-day.com/images/2017/feb/15-salman-khan.jpg包含我无法在浏览器中显示的图片。我正在使用Google Chrome。以下是我的代码。（我已经硬编码了这篇文章的图片网址的主机名）。任何人都可以帮我解决问题。

from socket import *
client= socket(AF_INET, SOCK_STREAM)
proxy_port = 8880
client.bind(("", proxy_port ))
client.listen(10)

while 1:
    client_connection, client_address = CLIENT.accept()
    request = client_connection.recv(102400).decode()

    if request.startswith("GET"):
        try:
            print(request)
            web = socket(AF_INET, SOCK_STREAM)
            web.connect(("images.mid-day.com", 80))
            web.send(request.encode())
            reply = web.recv(102400).decode()
            print(reply)
            client_connection.send(reply.encode())
            web.close()
        except:
            print("illegal req")
client.close()

这是我从浏览器获取的请求：

This is my get request from the browser

Answer 1

您只从上游服务器读取102400个字节，但图像响应（至少）为567702个字节。您应该阅读上游关机连接，除了使用sendall()以确保所有数据都已发送：

reply = b''
while True:
    data = web.recv(4096)
    if not data:
        break
    reply += data
client_connection.sendall(reply)

代理服务器允许文本但不允许图像

1 个答案: