Question

我试图通过TCP套接字从Java应用程序发送消息并在Python 2.7中读取它我希望前4个字节指定消息长度，所以我可以这样做：

header = socket.recv(4)
message_length = struct.unpack(">L",header)
message = socket.recv(message_length)

在Python端。

Java方面：

out = new PrintWriter(new BufferedWriter(new StreamWriter(socket.getOutputStream())),true);
byte[] bytes = ByteBuffer.allocate(4).putInt(message_length).array();
String header = new String(bytes, Charset.forName("UTF-8"));
String message_w_header = header.concat(message);
out.print(message_w_header);

这适用于某些消息长度（10,102个字符），但对于其他消息则失败（例如1017个字符）。如果输出我得到的每个字节的值，如果值失败：

Java:
Bytes 0 0 3 -7
Length 1017
Hex string 3f9

Python:
Bytes 0 0 3 -17
Length 1007
Hex string \x00\x00\x03\xef

我认为这与Java中的签名字节和Python中的unsigned有关，但我无法弄清楚我应该怎样做才能使它工作。

Answer 1

问题出在Java端 - b'\x03\xf9'无效utf-8字节序列：

>>> b'\x03\xf9'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 1: invalid start byte

似乎new String(bytes, Charset.forName("UTF-8"));使用'replace'错误处理程序b'\xef'是在utf-8中编码的'\ufffd' Unicode替换字符的三个字节中的第一个：

>>> b'\x03\xf9'.decode('utf-8', 'replace').encode('utf-8')
b'\x03\xef\xbf\xbd'

这就是您在Python中收到b'\x03\xef'而不是b'\x03\xf9'的原因。

要修复它，请使用Java而不是Unicode文本发送字节。

不相关，sock.recv(n)可能返回少于n个字节。如果套接字阻塞;您可以使用file = sock.makefile('rb')创建类似文件的对象，并调用file.read(n)来准确读取n个字节。

如何从Java发送4字节头并在Python中读取它？

1 个答案: