Python代码
decoded = base64.b64decode(base64input)
resultBytes = b""
i = 0
while i < len(decoded):
c = decoded[i + 0] * 256 + decoded[i + 1]
d = decoded[i + 2] * 256 + decoded[i + 3]
lenRead = 0
gzchunk = (bytes((31,139,8,0)) + decoded[i:i+c])
try:
with gzip.GzipFile(fileobj=io.BytesIO(gzchunk)) as gf:
while True:
readSize = min(16384, d - lenRead)
readBytes = gf.read(size=readSize)
lenRead += len(readBytes)
resultBytes += readBytes
if len(readBytes) == 0 or (d - lenRead) <= 0:
break
except IOError as err:
pass # provide error message later
i += c + 4
我用此Java代码尝试过,但失败了
// read file-content into byte array
byte[] decoded = null;
try {
decoded = IOUtils.toByteArray(new FileReader(fullFilePath), org.apache.commons.codec.Charsets.UTF_8);
} catch (Exception e) {
e.printStackTrace();
}
// Decode
byte[] fb = null;
try {
fb = StringUtils.newStringUtf8(Base64.decodeBase64(decoded)).getBytes("UTF-8");
} catch (Exception e1) {
e1.printStackTrace();
}
byte[] resultBytes = null;
int i = 0;
while (i < fb.length) {
int c = (fb[i + 0] * 256) + (fb[i + 1]);
int d = (fb[i + 2] * 256) + (fb[i + 3]);
int lenRead = 0;
byte[] a1 = convert2ByteArray(new int[] { 31, 139, 9, 0 });
byte[] a2 = Arrays.copyOfRange(fb, i, i + c);
byte[] gzchunk = copyByteArray(a1, a2);
GZIPInputStream gf = null;
byte[] readBytes;
int readSize;
try {
while (true) {
readSize = Math.min(16384, (d - lenRead));
gf = new GZIPInputStream(new ByteArrayInputStream(gzchunk), readSize);
int read = gf.read();
readBytes = ByteBuffer.allocate(4).putInt(read).array();
lenRead += readBytes.length;
resultBytes = copyByteArray(resultBytes, readBytes);
if (readBytes.length == 0 | (d - lenRead) <= 0) {
break;
}
}
} catch (Exception e) {
e.printStackTrace();
}
i += c + 4;
}
感谢您的支持
有关失败的更多说明。因此,对于python
print(fb[i + 0])
print(fb[i + 1])
print(fb[i + 2])
print(fb[i + 4])
输出为:
30
208
234
120
使用我的Java代码,输出为:
30
-48
-22
96
java.lang.IllegalArgumentException: buffer size <= 0
在线
gf = new GZIPInputStream(new ByteArrayInputStream(gzchunk), readSize);
@Joop:
根据您的建议,我知道已经为python代码编写了等效代码:
c = decoded[i + 0] * 256 + decoded[i + 1]
d = decoded[i + 2] * 256 + decoded[i + 3]
在Java中
int c= ((fb[i + 0] & 0xFF) << 8) | (fb[i + 1] & 0xFF);
int d= ((fb[i + 2] & 0xFF) << 8) + (fb[i + 3]);
但是我仍然收到相同数据的不同值: Python:
c = 7888
d = 60000
Java:
c = 27375
d = 48829
基本上我想做的是:
答案 0 :(得分:1)
错误:类型byte
是带符号的,并且由于必须屏蔽int(& 0xFF
)来防止符号扩展。
int c = (fb[i + 0] & 0xFF) << 8) | (fb[i + 1] & 0xFF);
有符号字节在-128到127之间,因此a1
可以简单地写为:
byte[] a1 = new byte[] { 31, (byte)139, 9, 0 };
然后,您可以将Java充分利用为:
// read file-content into byte array
Path path = Paths.get(fullFilePath);
byte[] decoded = Files.readAllBytes(path);
// Decode
byte[] fb = Base64.getDecoder().decode(decoded);
我会在更大的范围内捕获异常,因为必须在更大的范围内停止,并且更容易。
我没有检查的循环;这是可以简化的事情。
有问题的更多调试信息之后:
调试代码
print(fb[i + 4])
应该是
print(fb[i + 3])
c
现在是正确的; java传递-48而不是208的原因是byte
被签名:256-48 = 208和256-22 =234。对于d
,一些旧代码仍然弄乱了符号扩展名。
int d = ((fb[i + 2] & 0xFF) << 8) | (fb[i + 3] & 0xFF);
我尝试简化循环,没有保证。
ByteArrayOutputStream out = new ByteArrayOutputStream();
int i = 0;
ByteBuffer inbuf = ByteBuffer.wrap(fb);
while (inbuf.hasRemaining()) {
int c = inbuf.getShort() & 0xFFFF;
int d = inbuf.getShort() & 0xFFFF;
assert c <= inbuf.limit();
byte[] gzchunk = new byte[4 + c];
gzchunk[0] = 31;
gzchunk[1] = (byte)139;
gzchunk[2] = 9;
gzchunk[3] = 0;
inbuf.get(gzchunk, 4, c);
byte[] readBytes = new byte[d];
GZIPInputStream gf = new GZIPInputStream(
new ByteArrayInputStream(gzchunk), d);
int nread = gf.read(readBytes, 0, d);
// No loop required as non-blocking ByteArrayInputStream.
assert nread == d;
out.write(readBytes);
gf.close();
i += 4 + c;
assert inbuf.position() == i;
}
out.close();
return out.toByteArray();
由于16K没有限制(也许是python限制?),所以读取变得更简单。当Java> 8时,应该使用read
代替readAllBytes
。read
可以提供部分可用结果。但是ByteArrayInputStream具有所有可用数据。
使用默认顺序为ByteOrder.BIG_ENDIAN
的ByteBuffer将允许getShort
取消我们的计算。