使用编码删除字符串中的ASCII字符

时间:2014-05-27 10:16:16

标签: java string character-encoding ascii

我有一个字节数组,由一个串口事件填充,代码如下所示:

private InputStream input = null; 
......
......
public void SerialEvent(SerialEvent se){
  if(se.getEventType == SerialPortEvent.DATA_AVAILABLE){
    int length = input.available();
    if(length > 0){
      byte[] array = new byte[length];
      int numBytes = input.read(array);
      String text = new String(array);
    }
  }
}

变量text包含以下字符

"\033[K", "\033[m",  "\033[H2J", "\033[6;1H" ,"\033[?12l", "\033[?25h", "\033[5i", "\033[4i", "\033i" and similar types..

截至目前,我使用String.replace从字符串中删除所有这些字符。

我尝试了new String(array , 'CharSet'); //Tried with all CharSet options,但我无法删除它们。

有没有办法可以在不使用替换方法的情况下删除这些字符?

1 个答案:

答案 0 :(得分:0)

我给出了一个令人不满意的答案,感谢@OlegEstekhin指出这一点。 正如其他任何人都没有回答的那样,并且解决方案不是双线的,这就是它。

创建一个包装InputStream,抛出转义序列。我已经使用了PushbackInputStream,其中跳过了部分序列,可能仍然会被推回以便先读取。这里FilterInputStream就足够了。

public class EscapeRemovingInputStream extends PushbackInputStream {

    public static void main(String[] args) {
        String s = "\u001B[kHello \u001B[H12JWorld!";
        byte[] buf = s.getBytes(StandardCharsets.ISO_8859_1);
        ByteArrayInputStream bais = new ByteArrayInputStream(buf);
        EscapeRemovingInputStream bin = new EscapeRemovingInputStream(bais);
        try (InputStreamReader in = new InputStreamReader(bin,
                StandardCharsets.ISO_8859_1)) {
            int c;
            while ((c = in.read()) != -1) {
                System.out.print((char) c);
            }
            System.out.println();
        } catch (IOException ex) {
            Logger.getLogger(EscapeRemovingInputStream.class.getName()).log(
                Level.SEVERE, null, ex);
        }
    }

    private static final Pattern ESCAPE_PATTERN = Pattern.compile(
        "\u001B\\[(k|m|H\\d+J|\\d+:\\d+H|\\?\\d+\\w|\\d*i)");
    private static final int MAX_ESCAPE_LENGTH = 20;

    private final byte[] escapeSequence = new byte[MAX_ESCAPE_LENGTH];
    private int escapeLength = 0;
    private boolean eof = false;

    public EscapeRemovingInputStream(InputStream in) {
        this(in, MAX_ESCAPE_LENGTH);
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        for (int i = 0; i < len; ++i) {
            int c = read();
            if (c == -1) {
                return i == 0 ? -1 : i;
            }
            b[off + i] = (byte) c;
        }
        return len;
    }

    @Override
    public int read() throws IOException {
        int c = eof ? -1 : super.read();
        if (c == -1) { // Throw away a trailing half escape sequence.
            eof = true;
            return c;
        }
        if (escapeLength == 0 && c != 0x1B) {
            return c;
        } else {
            escapeSequence[escapeLength] = (byte) c;
            ++escapeLength;
            String esc = new String(escapeSequence, 0, escapeLength,
                    StandardCharsets.ISO_8859_1);
            if (ESCAPE_PATTERN.matcher(esc).matches()) {
                escapeLength = 0;
            } else if (escapeLength == MAX_ESCAPE_LENGTH) {
                escapeLength = 0;
                unread(escapeSequence);
                return super.read(); // No longer registering the escape
            }
            return read();
        }
    }

}
  • 用户致电EscapeRemovingInputStream.read
  • 这个read可以调用一些read本身来填充字节缓冲区escapeSequence
  • (可以通过调用unread
  • 进行推回
  • 原始read返回。

对转义序列的识别似乎是语法:命令字母,数字参数。因此我使用正则表达式。