Question

我的目标是创建HTTP请求（标题和正文）手动。它看起来像这样：

Some-Header1: some value1
Some-Header2: some value2
Some-Header3: some value3

-------------MyBoundary
Content-Disposition: form-data; name="file_content_0"; filename="123.pdf"
Content-Length: 93
Content-Type: application/pdf
Content-Transfer-Encoding: binary

  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====
  ==== here is the binary data of 123.pdf ====

-------------MyBoundary--

我发现这是通过其API将文件发送到Web服务的唯一方法，因为我在Ruby中嗅探了脚本的流量并且结果看起来像我＆＃39如上所示。

所以标题如＆＃34; Some-Header1＆＃34;和其他 - 是纯文本标题。请注意，还有＆＃34; -------------MyBoundary--＆＃34;在＆＃34; ==== here is the binary data of 123.pdf ====＆＃34;

之后

但＆＃34; ==== here is the binary data of 123.pdf ====＆＃34;是二进制数据。

问题是，如何将纯文本数据与二进制数据链接（组合）？

P.S。我一直试图通过标准库实现这一点，例如python请求和失败。此时我不会再考虑再次使用它们。现在我只需要知道如何组合纯文本和二进制数据。

更新：

如何轻松地将二进制数据嵌入到字符串中？

import textwrap

body_headers = textwrap.dedent(
    """
    -------------MyBoundary
    Content-Disposition: form-data; name="file_content_0"; filename="a.c"
    Content-Length: 1234
    Content-Type: image/jpeg
    Content-Transfer-Encoding: binary

                    %b ??? -> to indicate that a binary data will be placed here

    -------------MyBoundary--


    """
) % binary_data" #???

UPDATE2 ：

text1 = textwrap.dedent(
    """
    -------------MyBoundary
    Content-Disposition: form-data; name="file_content_0"; filename="a.pdf"
    Content-Length: 1234
    Content-Type: image/jpeg
    Content-Transfer-Encoding: binary

    replace_me

    -------------MyBoundary--


    """
)

with open("test1.pdf", "rb") as file_hander:
    binary_data = file_hander.read()

print (isinstance(binary_data, str)) # True
print (isinstance("replace_me", str)) # True

print text1.replace("replace_me", binary_data) # --> [Decode error - output not utf-8]

print text1.replace("replace_me", binary_data).encode("utf-8") # exception

错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 195: ordinal not in range(128)

这也给了我一个例外：

print unicode(text1.replace("replace_me", binary_data), "utf-8")
# UnicodeDecodeError: 'utf8' codec can't decode byte 0xc4 in position 195: invalid continuation byte

Answer 1

要从文件加载二进制数据，您可以

with open(file_name, 'rb') as the_file:
    binary_data = the_file.read()

现在，有两种情况，具体取决于您的Python版本：

Python 2 - `unicode`和`str`

binary_data将是str，除非你的其他字符串是unicode，否则连接应该完全正常，在这种情况下你可能应该编码（几乎没有网络功能需要{{1}在Python 2）中：

unicode

其中normal_str = unicode_str.encode(encoding)通常类似于encoding，"utf-8"或"utf-16"，但它可能更具异国情调。

Python 3 - `"latin-1"`和`str`

bytes将是binary_data对象，您无法简单地与默认bytes连接。如果用于发送数据的任何内容需要str，则遵循与Python 2相同的编码方法。如果需要bytes（可能不太可能用于网络），则必须解码给定的编码（因为这几乎无法猜测，你应该检查文件使用的编码）

str

再次将编码作为参数传递（提示：normal_str = byte_str.decode(encoding)应该没问题，因为它保留了字节，而其他像"latin-1"，可能会在实际二进制文件上失败数据（不是编码的字符串）[ HT到@SergeBallesta ]）

为避免在Python 3中出现此类问题，您可能希望使用"utf-8"而不是bytes从头开始将标头定义为something = b"whatever"（ 注意添加的something = "whatever" ）并将其他输入文件作为二进制文件打开到标题中。然后，简单地使用b连接字符串应该不是问题。

发送HTTP请求

要将此类原始数据发送到服务器，您有不同的选择：

如果您希望获得比+（或urllib）和urllib2更多的控制权，您可以使用原始套接字进行低级别网络，以便使用{{{{{}发送您喜欢的任何数据3}}（socket是如何实现这一点的一个很好的例子）
您可以使用{{1将数据（包括requests之间的所有内容）作为请求数据传递给---(snip)--MyBoundary请求（如果您的HTTP请求是一个，请求中未指定） }或POST

效率

如果您选择原始套接字并发送非常大的文件，您可能希望以块（使用urllib）读取文件并将其直接写入套接字（使用requests）。 [HT到example in the docs]

回复：更新

关于更新（实际上应该是一个新问题......）：{{1}没有格式字符串语法（既不是新的（the_file.read(number_of_bytes)）也不是旧的（the_socket.send(read_binary_data)）） }}。您需要在"{}"对象上使用"%s"将其转换为字符串并正确使用格式字符串（或使用bytes将字符串转换为decode并使用常规连接）。另请注意，bytes 不对bytes有效，因为正则表达式不适用于Python中的encode。

连接纯文本字符串和二进制数据

1 个答案:

Python 2 - `unicode`和`str`

Python 3 - `"latin-1"`和`str`

发送HTTP请求

效率

回复：更新

连接纯文本字符串和二进制数据

1 个答案:

Python 2 - unicode和str

Python 3 - "latin-1"和str

发送HTTP请求

效率

回复：更新

Python 2 - `unicode`和`str`

Python 3 - `"latin-1"`和`str`