我有一个UTF-8文本,我想按字节修剪/截断它,以便得到 装扮成字节长的新字符串。
public static String trimByBytes(String text, int longitudBytes) throws Exception {
byte bytes_text[] = text.getBytes("UTF-8");
int negativeBytes = 0;
byte byte_trimmed[] = new byte[longitudBytes];
if (byte_trimmed.length <= bytes_text.length) {
//copy array manually and count negativeBytes
for (int i = 0; i < byte_trimmed.length; i++) {
byte_trimmed[i] = bytes_text[i];
if (byte_trimmed[i] < 0) {
negativeBytes++;
}
}
//if negativeBytes are odd
if (negativeBytes % 2 != 0 && byte_trimmed[byte_trimmed.length - 1] < 0) {
byte_trimmed[byte_trimmed.length - 1] = 0;//delete last
}
}else{
for (int i = 0; i < bytes_text.length; i++) {
byte_trimmed[i] = bytes_text[i];
}
}
return new String(byte_trimmed);
}
}
例如
答案 0 :(得分:0)
创建一个明确的CharsetDecoder,并在其上设置CodingErrorAction.IGNORE。
由于CharsetDecoder与ByteBuffer一起使用,因此应用长度限制就像调用ByteBuffer的limit方法一样简单:
String trimByBytes(String str, int lengthOfBytes) {
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
ByteBuffer buffer = ByteBuffer.wrap(bytes);
if (lengthOfBytes < buffer.limit()) {
buffer.limit(lengthOfBytes);
}
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
try {
return decoder.decode(buffer).toString();
} catch (CharacterCodingException e) {
// We will never get here.
throw new RuntimeException(e);
}
}