Question

我正在尝试在python 2.7.13中编写代码作为家庭作业的一部分，该作业应该下载指定URL指向的网页以及该URL引用的所有图像对象。强烈指示我使用套接字编程来执行任务。我不允许使用任何更高的库，例如：urllib2，urrlib3等。

我编写了一个简单的代码，使用套接字编程将网页下载为html文件，我可以在笔记本电脑中下载特定的网页。我也试图在该网页上下载任何图像，但它显然不起作用。如果网页上有图像，则保存的html文件不会显示该图像。从某种意义上说，我的代码效率很低，因为它没有提供用户指定URL的选项。如何修改代码，以便从用户处获取输入URL并下载网页以及该网页上存在的所有相关图像对象。

以下是我的代码：

import socket

# Set up a TCP/IP socket
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)

# Connect as client to a selected server
# on a specified port

s.connect(("www.automatetheboringstuff.com",80))

# Protocol exchange - sends and receives
s.send("GET /chapter16/ HTTP/1.0\n\n")
file=open('download_webpage.html','w')
picture = "";
while True:
        resp = s.recv(4096)
        if resp == "": break
        print resp,
        file.write(resp)
        picture = picture + resp

# Close the connection when completed
file.close
s.close()

# Look for the end of the header (2 CRLF)
pos = picture.find("\r\n\r\n");
print 'Header length',pos
print picture[:pos]

# Skip past the header and save the picture data
picture = picture[pos+4:]
fhand = open("stuff.jpg","w")
fhand.write(picture);
fhand.close()

网页以及使用python 2.7的图像下载

0 个答案: