我是python的新手,我在插座上试试运气。所以我写了一个简单的http客户端,但令我惊讶的是它无法访问firefox可以访问的网页,但他们使用相同的标题
import socket
clientsocket= socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.connect(("213.229.83.205",80))#connect to proxy at given address
print "connected to 213.229.83.205"
sdata= """GET http://google.co.ug/ HTTP/1.1
Host: google.co.ug
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Proxy-Connection: keep-alive
Cookie: cookie <-- Real cookie deleted
"""
print "sending request"
clientsocket.send(sdata);
rdata=clientsocket.recv(10240)
if not rdata: print "no data found"
else:
print "receiving data !"
myfile=open("c:/users/markdenis/desktop/google.html","w")
myfile.write(str(rdata))
myfile.close()
print "data written to file on desktop"
clientsocket.close()
raw_input()#system(pause)
当我运行它时,它会显示:
connected to 213.229.83.205
sending request
no data found
答案 0 :(得分:5)
HTTP协议在每个标头的末尾需要\r\n
,在HTTP标头的末尾需要一个空行。您没有明确sdata
缓冲区中的行结尾,因此您的缓冲区最终只有\n
行结尾。
在Windows,Linux和OS X上测试,确保:
>>> x = """a
b
c"""
>>> x
'a\\nb\\nc\\n'
您需要的地方:
>>> x = "a\r\nb\r\nc\r\n"
>>> x
'a\\r\\nb\\r\\nc\\r\\n'
添加\r\n
并尝试一下。直接在缓冲区中执行此操作会为您提供额外的\n
集,因此请将其拆分:
sdata = "GET http://google.co.ug/ HTTP/1.1\r\n"
sdata += "Host: google.co.ug\r\n"
sdata += "User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0\r\n"
sdata += "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
sdata += "Accept-Language: en-us,en;q=0.5\r\n"
sdata += "Accept-Encoding: gzip, deflate\r\n"
sdata += "Proxy-Connection: keep-alive\r\n"
sdata += "\r\n"