通过套接字,Python发送二进制文件时出现问题

时间:2018-10-09 13:53:17

标签: python file sockets

我正在尝试编写一个程序,将二进制文件从客户端传输到服务器。这是代码:

客户端(发送文件)

  def send_file(self,filename):
        print("Sending: " + filename)
        size = self.BUFFER_SIZE
        with open(filename,'rb') as f:
            raw = f.read().decode()
        buffer = [raw[i:i + size] for i in range(0, len(raw), size)]
        for x in range(len(buffer)):
            self.sock.sendall(buffer[x].encode())

        return

服务器(接收文件)

def recv_file(self, conn, filename):
    packet = ""
    buffer = ""
    while True:
        buffer = conn.recv(self.BUFFER_SIZE)
        packet = packet + str(buffer.decode())
        if not len(buffer) == self.BUFFER_SIZE:
            break
    with open(filename, 'wb') as f:
        f.write(bytes(packet.encode()))
    #print(packet)
    return 

通过这种方式,我可以传输txt文件,但是当我必须传输jpeg或任何其他类型的文件时,它将冻结在循环中。 有人可以解释一下为什么吗?我是py的新手,正在尝试学习

2 个答案:

答案 0 :(得分:2)

如果双方具有相同的区域设置编码,则不应冻结,但是它很容易因异常而死亡。

您正在以二进制形式(良好)进行读取和发送,但莫名其妙地decode-到str,然后encode回到bytes(不好)。问题是,不能保证任意二进制数据在任何给定的语言环境中都是可解码的。如果您的区域设置编码为UTF-8,则很可能是不合法的。如果是latin-1,那是合法的,但毫无意义。

更糟糕的是,如果您的客户端和服务器使用不同的语言环境编码,则解码的结果可能在每一侧都不同(因此长度将不匹配)。

始终使用bytes,不要在字符串之间来回转换,语言环境设置也无关紧要。您的代码也将运行得更快。您实际上还需要提前发送文件长度。您的循环希望recv仅在文件完成后返回较短的长度,但是如果:

  1. 文件是缓冲区大小的精确倍数,或者
  2. 套接字碰巧以不匹配缓冲区大小的块发送数据

在情况#2碰巧的情况下和确定情况下#1的情况下,您每个人都可以得到简短的recv结果。

一种更安全的方法是在传输之前添加文件长度作为前缀,而不是希望分块按预期进行:

def send_file(self,filename):
    print("Sending:", filename)
    with open(filename, 'rb') as f:
        raw = f.read()
    # Send actual length ahead of data, with fixed byteorder and size
    self.sock.sendall(len(raw).to_bytes(8, 'big'))
    # You have the whole thing in memory anyway; don't bother chunking
    self.sock.sendall(raw)

def recv_file(self, conn, filename):
    # Get the expected length (eight bytes long, always)
    expected_size = b""
    while len(expected_size) < 8:
        more_size = conn.recv(8 - len(expected_size))
        if not more_size:
            raise Exception("Short file length received")
        expected_size += more_size

    # Convert to int, the expected file length
    expected_size = int.from_bytes(expected_size, 'big')

    # Until we've received the expected amount of data, keep receiving
    packet = b""  # Use bytes, not str, to accumulate
    while len(packet) < expected_size:
        buffer = conn.recv(expected_size - len(packet))
        if not buffer:
            raise Exception("Incomplete file received")
        packet += buffer
    with open(filename, 'wb') as f:
        f.write(packet)

答案 1 :(得分:1)

作为ShadowRanger帖子的附录,如果您确实想在不使用socket.sendfile的情况下维护文件分块,则可以利用一些技巧来清理代码并减少内存占用。

发送过程非常简单,因为我们从ShadowRanger复制了发送文件大小的过程,并添加了一个非常简单的循环来发送数据块,直到数据块变空(文件末尾)。

def send_file(self,filename):
    print("Sending: " + filename)
    #send file size as big endian 64 bit value (8 bytes)
    self.sock.sendall(os.stat(filename).st_size.tobytes(8,'big'))
    with open(filename,'rb') as f: #open our file to read
        while True:
            chunk = f.read(self.BUFFER_SIZE) #get next chunk
            if not chunk: #empty chunk indicates EOF
                break
            self.sock.sendall(chunk) #send the chunk

接收文件也是非常简单的,首先使用相同的过程读取所需的文件大小,然后执行循环以将数据读取到该文件中,直到达到所需的大小为止。然后,我们在接收数据时使用f.tell(),这是一种简单的方法来判断整个文件是否已发送。

def recv_file(self, conn, filename):
    # file size transfer copied from ShadowRanger
    # Get the expected length (eight bytes long, always)
    expected_size = b"" #buffer to read in file size
    while len(expected_size) < 8: #while buffer is smaller than 8 bytes
        more_size = conn.recv(8 - len(expected_size)) #read up to remaining bytes
        if not more_size: #nothing was read
            raise Exception("Short file length received")
        expected_size += more_size #extend buffer
    expected_size = int.from_bytes(expected_size, 'big') #Convert to int, the expected file length
    with open(filename, 'wb') as f: #open our file to write
        while f.tell() < expected_size: #while it's smaller than our expected size
            bytes_recvd = conn.recv() #read any available data 
            f.write(bytes_recvd)