Question

我从IOStream获取字节并将其转换为字符串。从该字符串我使用substring api提取序列。

ByteArray的大小为128个字节。如果流只包含10个字节，剩余的则填充零[最初填充]。我通过传递给字符串构造函数new String（byte []）并检查长度来将字节数组转换为字符串。长度是128.为什么显示128？实际上它应该显示10个字节的字符长度。如何在转换为字符串时消除零。是否有任何api消除字节数组中的默认零。它在从构造的字符串创建子字符串时会产生问题。

    byte[] b = { 99, 116, 101, 100, 46, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    0, 0, 0, 0}
 System.out.println("byte length = " + b.length);
            String str;
            try {
                str = new String(b, "UTF-8");
                System.out.println("String length = " + str.length());
                System.out.println(str);
                System.out.println("  ## substring  =  " + str.substring(0));
                System.out.println(" substring length = "
                        + str.substring(0).length());
                System.out.println("Done......");
            } catch (UnsupportedEncodingException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }0, 0, 0 };

Answer 1

要从字节数组的一部分创建String，请使用构造函数String(byte[] bytes, int offset, int length, String charsetName)。例如：

// uses the first 10 bytes of b
str = new String(b, 0, 10, "UTF-8");

此外，如果您正在为Java 7进行编译，那么您也可以使用StandardCharsets（来自java.nio.charset包），并避免必须处理UnsupportedEncodingException。例如：

str = new String(b, 0, 10, StandardCharsets.UTF_8);

Answer 2

您的代码就像这样

  byte[] b = { 99, 116, 101, 100, 46, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0};

        int nonZeroPos=0;
        for (int i = b.length-1; i >0; i--) {
            if(b[i]!=0){
                nonZeroPos=i;
                 break;
            }
        }


        System.out.println("byte length = " + b.length);
        String str;
        try {
             str = new String(b, 0, nonZeroPos, "UTF-8");
            System.out.println("String length = " + str.length());
            System.out.println(str);
            System.out.println("  ## substring  =  " + str.substring(0));
            System.out.println(" substring length = "
                    + str.substring(0).length());
            System.out.println("Done......");
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

你也可以这样做 -

 String zerostring = new String(new byte[]{0});
 str=new String(b).replace(zerostring , "");
 System.out.println(str);

但这样做的缺点是它将取代单词中的0。

Answer 3

首先解释一下。

并非每个字节序列都是有效的UTF-8。二进制字节0（0x00）有效，并且不像C中那样终止字符串。

事实上终止\0后来被C＆C的Kernighan或Ritchie谴责为次优。

为了防止出现问题，不仅U + 007F（0x7f）以上的Unicode代码点是多字节编码的（设置高位字节），还有Java's UTF-8, DataOutputSream中的U + 0000。

byte[] bytes = get UTF-8 bytes from string

现在字节可以有一个代码点0的多字节序列。

因此，您可以清理字节，小循环或清理字符串：

str = str.replace("\u0000", ""); // All bytes 0
str = str.replaceFirst("\u0000+$", ""); // Only trailing bytes 0, regex

Answer 4

当您从InputStream read时，它会告诉您读取了多少字节。 byte[]本身的长度大多不相关（除了定义可以在单个调用中读取的最大字节数）。以后不需要检查byte[]来尝试确定有多少数据是相关的。请注意read的返回值，并在创建String时使用该值。

此外，如果您的所有数据都是文字，请考虑使用InputStreamReader，也许与BufferedReader结合使用。

从字节数组创建字符串时消除默认零

4 个答案: