解释Java字符串

时间:2010-10-01 23:11:46

标签: java regex string

有没有办法将字符串如“hello \ r \ n \ world”解释为字符串,其中\ r \ n已转换为实际的字面值。

场景是用户键入正则表达式替换表达式并在UI中键入\ r \ n。那个得到了逃脱,但我想从中得到实际的解释字符串。

2 个答案:

答案 0 :(得分:2)

我并不直接熟悉一种“简单”的处理方式(即我注意到是否有内置的库可以处理这个)。 但实现此目的的一种方法是阅读有关转义序列的JLS规范并编写单遍解析器,它可以查找和评估每个转义序列。 (检查JLS 3.10.6 Escape Sequences for Character and String Literals)。

现在我知道一个事实,你会很快忘记处理一些奇怪的事情,例如八进制转义是棘手的,因为它允许1,2或3位数,并且在每种情况下都允许不同当它是一个转义时,以及它只是整数时的值。

取示例字符串“\ 431”这是与字符“1”连接的ocal转义符'\ 43',因为八进制转义符的第一个数字是4,因此不能是完整的三位数八进制值在这种情况下,它只允许[0-3]作为第一个数字。

大约一年前,我正在为1.3规范的子集共同编写一个Java编译器,它有转义序列,下面我已经包含了处理转义的代码 - 你实际上应该能够代码字面上的实际情况并包含在Utility类中(如果您觉得有慈善事业可能会信用):

private String processCharEscapes(String strVal) {
    // Loop helpers
    char[]          chrArr = strVal.toCharArray();
    StringBuilder   strOut = new StringBuilder(strVal.length());
    String          strEsc = "";    // Escape sequence, string buffer
    Character       chrBuf = null;  // Dangling character buffer

    // Control flags
    boolean inEscape    = false;    // In escape?
    boolean cbOctal3    = true;     // Can be octal 3-digit

    // Parse characters
    for(char c : chrArr) {
        if (!inEscape) {
            // Listen for start of escape sequence
            if (c == '\\') {
                inEscape = true;    // Enter escape
                strEsc = "";        // Reset escape buffer
                chrBuf = null;      // Reset dangling character buffer
                cbOctal3 = true;    // Reset cbOctal3 flag
            } else {
                strOut.append(c);   // Save to output
            }
        } else {
            // Determine escape termination
            if (strEsc.length() == 0) { // First character
                if (c >= 48 && c <= 55) {   // c is a digit [0-7]
                    if (c > 51) {   // c is a digit [4-7]
                        cbOctal3 = false;
                    }
                    strEsc += c;    // Save to buffer
                } else {    // c is a character
                    // Single-character escapes (will terminate escape loop)
                    if (c == 'n') {
                        inEscape = false;
                        strOut.append('\n');
                    } else if(c == 't') {
                        inEscape = false;
                        strOut.append('\t');
                    } else if(c == 'b') {
                        inEscape = false;
                        strOut.append('\b');
                    } else if(c == 'r') {
                        inEscape = false;
                        strOut.append('\r');
                    } else if(c == 'f') {
                        inEscape = false;
                        strOut.append('\f');
                    } else if(c == '\\') {
                        inEscape = false;
                        strOut.append('\\');
                    } else if(c == '\'') {
                        inEscape = false;
                        strOut.append('\'');
                    } else if(c == '"') {
                        inEscape = false;
                        strOut.append('"');
                    } else {
                        // Saw illegal character, after escape character '\'
                        System.err.println(ErrorType.SYNTAX_ERROR, "Illegal character escape sequence, unrecognised escape: \\" + c);
                    }
                }
            } else if(strEsc.length() == 1) {   // Second character (possibly)
                if (c >= 48 && c <= 55) {   // c is a digit [0-7]
                    strEsc += c;    // Save to buffer
                    if (!cbOctal3) {    // Terminate since !cbOctal3
                        inEscape = false;
                    }
                } else {
                    inEscape = false;   // Terminate since c is not a digit
                    chrBuf = c;         // Save dangling character
                }
            } else if(strEsc.length() == 2) {   // Third character (possibly)
                if (cbOctal3 && c >= 48 && c <= 55) {
                    strEsc += c;        // Save to buffer
                } else {
                    chrBuf = c;         // Save dangling character
                }
                inEscape = false;       // Will always terminate after third character, no matter what
            }

            // Did escape sequence terminate, at character c?
            if (!inEscape && strEsc.length() > 0) {
                // strEsc is legal 1-3 digit octal char code, convert and add
                strOut.append((char)Integer.parseInt(strEsc, 8));

                if (chrBuf != null) {   // There was a dangling character
                    // Check for chained escape sequences (e.g. \10\10)
                    if (chrBuf == '\\') {
                        inEscape = true;    // Enter escape
                        strEsc = "";        // Reset escape buffer
                        chrBuf = null;      // Reset dangling character buffer
                        cbOctal3 = true;    // Reset cbOctal3 flag
                    } else {
                        strOut.append(chrBuf);
                    }
                }
            }
        }
    }

    // Check for EOL-terminated escape sequence (special case)
    if (inEscape) {
        // strEsc is legal 1-3 digit octal char code, convert and add
        strOut.append((char)Integer.parseInt(strEsc, 8));

        if (chrBuf != null) {   // There was a dangling character
            strOut.append(chrBuf);
        }
    }

    return strOut.toString();
}

我希望这会对你有所帮助。

答案 1 :(得分:1)

Apache commons StringEscapeUtils.unescapeJava(...)方法将完成这项工作。虽然从javadoc描述中不清楚,但这些方法处理unicode转义以及“普通”Java String转义。