我正在编写一种方法,为其他两个系统之间的接口创建固定长度的消息。
必须按照约定的长度(字节)为每个项目传输消息,但如果消息长度超过约定的长度,则消息应按项目的长度截断。
该消息包含2个字节的字符,因此如果它在字符中间被截断,则会被截断为损坏的状态。
为了计算正确的字节,它将从头开始搜索要剪切的长度。如果消息很长,性能应该很差。
我找不到更好的方法,所以我在这里寻求帮助。对不起,代码很复杂冗余。整个项目可用here。
package thecodinglog.string;
public class StringHelper {
public static String substrb2(String str, Number beginByte) {
return substrb2(str, beginByte, null, null, null);
}
public static String substrb2(String str, Number beginByte, Number byteLength) {
return substrb2(str, beginByte, byteLength, null, null);
}
/**
* Returns the substring of the String.
* It returns a string as specified length and byte position.
* You can pad characters left or right when there is a specified length.
* It distinguishes between 1 byte character and 2 byte character and returns it exactly as specified byte length.
* If the start position or the specified length causes a 2-byte character to be truncated in the middle,
* it will be converted to Space.
* You can specify either left or right padding.
*
* If beginByte is 0, it is changed to 1 and processed.
* If beginByte is less than 0, the string is searched for from right to left.
* If beginByte or byteLength is a real number, the decimal point is discarded.
* If you do not specify a length, returns everything from the starting position to the right-end string.
*
* Examples:
* <blockquote><pre>
* StringHelper.substrb2("a好호b", 1, 10, null, "|") returns "a好호b||||"
* StringHelper.substrb2("ab한글", 4, 2) returns " "
* StringHelper.substrb2("한a글", -3, 2) returns "a "
* StringHelper.substrb2("abcde한글이han gul다ykd", 7) returns " 글이han gul다ykd"
* </pre></blockquote>
*
* @param str a string to substring
* @param beginByte the beginning byte
* @param byteLength length of bytes
* @param leftPadding a character for padding. It must be 1 byte character.
* @param rightPadding a character for padding. It must be 1 byte character.
* @return a substring
*/
public static String substrb2(String str, Number beginByte, Number byteLength, String leftPadding, String rightPadding) {
if (str == null || str.equals("")) {
throw new IllegalArgumentException("The source string can not be an empty string or null.");
}
if (leftPadding != null && rightPadding != null) {
throw new IllegalArgumentException("Left padding, right padding Either of two must be null.");
}
if (leftPadding != null) {
if (leftPadding.length() != 1) {
throw new IllegalArgumentException("The length of the padding string must be one.");
}
if (getByteLengthOfChar(leftPadding.charAt(0)) != 1) {
throw new IllegalArgumentException("The padding string must be 1 Byte character.");
}
}
if (rightPadding != null) {
if (rightPadding.length() != 1) {
throw new IllegalArgumentException("The length of the padding string must be one.");
}
if (getByteLengthOfChar(rightPadding.charAt(0)) != 1) {
throw new IllegalArgumentException("The padding string must be 1 Byte character.");
}
}
int beginPosition = beginByte.intValue();
if (beginPosition == 0) beginPosition = 1;
int length;
if (byteLength != null) {
length = byteLength.intValue();
if (length < 0) {
return null;
}
} else {
length = -1;
}
if (length == 0)
return null;
boolean beginHalf = false;
int accByte = 0;
int startIndex = -1;
if (beginPosition >= 0) {
for (int i = 0; i < str.length(); i++) {
if (beginPosition - 1 == accByte) {
startIndex = i;
accByte = accByte + getByteLengthOfChar(str.charAt(i));
break;
} else if (beginPosition == accByte) {
beginHalf = true;
startIndex = i;
accByte = accByte + getByteLengthOfChar(str.charAt(i));
break;
} else if (accByte + 2 == beginPosition && i == str.length() - 1) {
beginHalf = true;
accByte = accByte + getByteLengthOfChar(str.charAt(i));
break;
}
accByte = accByte + getByteLengthOfChar(str.charAt(i));
}
} else {
beginPosition = beginPosition * -1;
if(length > beginPosition){
length = beginPosition;
}
for (int i = str.length() - 1; i >= 0; i--) {
accByte = accByte + getByteLengthOfChar(str.charAt(i));
if (i == str.length() - 1) {
if (getByteLengthOfChar(str.charAt(i)) == 1) {
if (beginPosition == accByte) {
startIndex = i;
break;
}
} else {
if (beginPosition == accByte) {
if (length > 1) {
startIndex = i;
break;
} else {
beginHalf = true;
break;
}
}else if(beginPosition == accByte - 1){
if(length == 1){
beginHalf = true;
break;
}
}
}
} else {
if (getByteLengthOfChar(str.charAt(i)) == 1) {
if (beginPosition == accByte) {
startIndex = i;
break;
}
} else {
if (beginPosition == accByte) {
if (length > 1) {
startIndex = i;
break;
} else {
beginHalf = true;
break;
}
} else if(beginPosition == accByte - 1) {
if(length > 1){
startIndex = i + 1;
}
beginHalf = true;
break;
}
}
}
}
}
if (accByte < beginPosition) {
throw new IndexOutOfBoundsException("The start position is larger than the length of the original string.");
}
StringBuilder stringBuilder = new StringBuilder();
int accSubstrLength = 0;
if (beginHalf) {
stringBuilder.append(" ");
accSubstrLength++;
}
if (byteLength == null) {
stringBuilder.append(str.substring(startIndex));
return new String(stringBuilder);
}
for (int i = startIndex; i < str.length() && startIndex >= 0; i++) {
accSubstrLength = accSubstrLength + getByteLengthOfChar(str.charAt(i));
if (accSubstrLength == length) {
stringBuilder.append(str.charAt(i));
break;
} else if (accSubstrLength - 1 == length) {
stringBuilder.append(" ");
break;
} else if (accSubstrLength - 1 > length) {
break;
}
stringBuilder.append(str.charAt(i));
}
if (leftPadding != null) {
int diffLength = byteLength.intValue() - accSubstrLength;
StringBuilder padding = new StringBuilder();
for (int i = 0; i < diffLength; i++) {
padding.append(leftPadding);
}
stringBuilder.insert(0, padding);
}
if (rightPadding != null) {
int diffLength = byteLength.intValue() - accSubstrLength;
StringBuilder padding = new StringBuilder();
for (int i = 0; i < diffLength; i++) {
padding.append(rightPadding);
}
stringBuilder.append(padding);
}
return new String(stringBuilder);
}
private static int getByteLengthOfChar(char c) {
if ((int) c < 128) {
return 1;
} else {
return 2;
}
}
}
新尝试的代码是
String testData = "한글이가득";
Charset charset = Charset.forName("EUC-KR");
ByteBuffer byteBuffer = charset.encode(testData);
byte[] newone = Arrays.copyOfRange(byteBuffer.array(), 1, 5);
CharsetDecoder charsetDecoder = charset.newDecoder()
.replaceWith(" ")
.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE);
CharBuffer charBuffer = charsetDecoder.decode(ByteBuffer.wrap(newone));
System.out.println(charBuffer.toString());
我期待“글”而不是“畸邦”。 我认为起始索引必须是正确的解码位置,但我认为不可能让方法知道我想要的东西。
添加示例失败
index| 0 1 2 3 4 5 6 7 8 9
Char | 한 | 글 | 이 | 가 | 득
---- | ---- | ---- | ---- | ---- | ----
hex | c7d1 | b1db | c0cc | b0a1 | b5e6
---- | ---- | ---- | ---- | ---- | ----
假设起始索引为1且长度为4个字节,则子十六进制代码将为此
index| 0 1 2 3 4 5 6 7 8 9
Char | 한 | 글 | 이 | 가 | 득
---- | ---- | ---- | ---- | ---- | ----
hex | c7d1 | b1db | c0cc | b0a1 | b5e6
---- | ---- | ---- | ---- | ---- | ----
sub | d1 | b1db | c0
当解码器解码 d1b1dbc0 时,它将 d1b1 视为一个字符,并将 dbc0 视为一个字符。这可能会因字符集而异,但在这种情况下,它会改变。除非解码器知道原始字符的字节集,否则解码器将使用错误的字符对其进行解码,因为该字节不知道起始点。
我认为这个方法的关键是如何让解码器知道原始字符的起始位置(以字节为单位)。
答案 0 :(得分:1)
将整个String转换为byte []并切割数组更容易。然后尝试将数组片段转换回String。如果转换失败,则跳过片段数组的最后一个字节。
答案 1 :(得分:1)
有一种NIO方法。
使用CharsetEncoder#encode
,可以将字符串(或者更确切地说是CharBuffer
,但转换很简单)编码为字节数组(实际上是ByteBuffer
),所有这些都是输入中的可能字符将被转换,直到输入完全处理完毕,但从不溢出输出。
CoderResult.OVERFLOW表示输出缓冲区中没有足够的空间来编码更多字符。应该使用具有更多剩余字节的输出缓冲区再次调用此方法。这通常通过从输出缓冲区中排出任何编码字节来完成。
完成编辑,这是一个例子(虽然我仍然不确定你想要完成什么,这是我最好的猜测),你的字符串한글이가득
使用编码EUC-KR。< / p>
首先,让我们看看每个字符的字节数组表示是什么
Char | 한 | 글 | 이 | 가 | 득
---- | ---- | ---- | ---- | ---- | ----
hex | c7d1 | b1db | c0cc | b0a1 | b5e6
所以整个字符串需要写入10个字节
现在,假设我们的消息长度为9个字节。这将允许我们发送한글이가
(8个字节),这是0xc7d12b1dbc0ccb0a1
,但由于没有足够的空间发送득
(0xb5e6
需要2个字节,我们只有一个),其余的缓冲区应该是空白的。
确实:
String testData = "한글이가득";
CharsetEncoder encoder = charset.newEncoder();
// We create a 9 bytes buffer
ByteBuffer limitedSizeOutput = ByteBuffer.allocate(9);
// We encode
CoderResult coderResult = encoder.encode(CharBuffer.wrap(testData.toCharArray()), limitedSizeOutput, true);
// The encoder tells us that it could not fit the whole chars in 9 bytes
System.out.println(coderResult); // prints OVERFLOW
// We can check that it encoded 8 bytes out of the 10 that compose the original string data
limitedSizeOutput.flip();
System.out.println(limitedSizeOutput.limit()); // prints 8
// We can see that these are in effect 한글이가 by reading the uffer
System.out.println(charset.newDecoder().decode(limitedSizeOutput).toString());