Question

我正在尝试转到http://www.py4inf.com/code/romeo.txt，阅读romeo.txt的内容并将其打印出来，使用的是python 3.6.1。

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
mysock.send('GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n'.encode("utf8"))

while True:
    data = mysock.recv(512)
    if ( len(data) < 1 ) :
        break
    print (data.decode("utf8"))

mysock.close()

而不是打印出来的页面内容

TTP/1.1 404 Not Found
Server: nginx
Date: Wed, 21 Jun 2017 03:00:15 GMT
Content-Type: text/html
Content-Length: 162
Connection: close
 <html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html

这是为什么？提前致谢

Answer 1

理论上，Host标头仅在HTTP 1.1之后是必需的，但似乎特定服务器需要Host标头，即使对于HTTP 1.0也是如此。我不确定这是否是Nginx的默认行为，或者服务器管理员是否以这种方式明确配置了它。

在任何情况下，请尝试将您的请求更改为以下内容：

mysock.send('GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\nHost: www.py4inf.com\n\n'.encode("utf8"))

我可以理解你的困惑 - 恕我直言，如果它坚持要提供400标题，它应该返回404而不是Host（因为它是客户端请求问题，而不是没有资源的问题。）

使用套接字Python从网站读取文本

1 个答案: