如何删除不平衡/未共享的双引号(在Java中)

时间:2012-03-29 16:05:07

标签: java regex string-parsing

我想与大家分享这个相对聪明的问题。 我试图从字符串中删除不平衡/不成对的双引号。

我的工作正在进行中,我可能接近解决方案。但是,我还没有得到一个有效的解决方案。 我无法从字符串中删除未配对/未提交的双引号。

示例输入

string1=injunct! alter ego."
string2=successor "alter ego" single employer"  "proceeding "citation assets"

输出

string1=injunct! alter ego.
string2=successor "alter ego" single employer  proceeding "citation assets"

这个问题听起来很像 Using Java remove unbalanced/unpartnered parenthesis

到目前为止,这是我的代码(它不会删除所有无用的双引号)

private String removeUnattachedDoubleQuotes(String stringWithDoubleQuotes) {
    String firstPass = "";

    String openingQuotePattern = "\\\"[a-z0-9\\p{Punct}]";
    String closingQuotePattern = "[a-z0-9\\p{Punct}]\\\"";

    int doubleQuoteLevel = 0;
    for (int i = 0; i < stringWithDoubleQuotes.length() - 3; i++) {
        String c = stringWithDoubleQuotes.substring(i, i + 2);
        if (c.matches(openingQuotePattern)) {
            doubleQuoteLevel++;
            firstPass += c;
        }
        else if (c.matches(closingQuotePattern)) {
            if (doubleQuoteLevel > 0) {
                doubleQuoteLevel--;
                firstPass += c;
            }
        }
        else {
            firstPass += c;
        }
    }

    String secondPass = "";
    doubleQuoteLevel = 0;
    for (int i = firstPass.length() - 1; i >= 0; i--) {
        String c = stringWithDoubleQuotes.substring(i, i + 2);
        if (c.matches(closingQuotePattern)) {
            doubleQuoteLevel++;
            secondPass = c + secondPass;
        }
        else if (c.matches(openingQuotePattern)) {
            if (doubleQuoteLevel > 0) {
                doubleQuoteLevel--;
                secondPass = c + secondPass;
            }
        }
        else {
            secondPass = c + secondPass;
        }
    }

    String result = secondPass;

    return result;
}

2 个答案:

答案 0 :(得分:2)

如果没有嵌套,可以在单个正则表达式中完成 有一个大致定义的分界面的概念,并且有可能“偏向” 这些规则可以获得更好的结果 这一切都取决于规定的规则。这个正则表达式考虑到了 三种可能的情况按顺序排列;

  1. 有效对
  2. 无效配对(有偏见)
  3. 无效单
  4. 它也不会在行尾之外解析“”。但它确实做了多个 行组合为单个字符串。要更改它,请删除您看到的\n


    全球背景 - 原始查找正则表达式
    缩短

    (?:("[a-zA-Z0-9\p{Punct}][^"\n]*(?<=[a-zA-Z0-9\p{Punct}])")|(?<![a-zA-Z0-9\p{Punct}])"([^"\n]*)"(?![a-zA-Z0-9\p{Punct}])|")
    

    替换分组

    $1$2 or \1\2
    

    扩展原始正则表达式:

    (?:                            // Grouping
                                      // Try to line up a valid pair
       (                                 // Capt grp (1) start 
         "                               // "
          [a-zA-Z0-9\p{Punct}]              // 1 of [a-zA-Z0-9\p{Punct}]
          [^"\n]*                           // 0 or more non- [^"\n] characters
          (?<=[a-zA-Z0-9\p{Punct}])         // 1 of [a-zA-Z0-9\p{Punct}] behind us
         "                               // "
       )                                 // End capt grp (1)
    
      |                               // OR, try to line up an invalid pair
           (?<![a-zA-Z0-9\p{Punct}])     // Bias, not 1 of [a-zA-Z0-9\p{Punct}] behind us
         "                               // "
       (  [^"\n]*  )                        // Capt grp (2) - 0 or more non- [^"\n] characters
         "                               // "
           (?![a-zA-Z0-9\p{Punct}])      // Bias, not 1 of [a-zA-Z0-9\p{Punct}] ahead of us
    
      |                               // OR, this single " is considered invalid
         "                               // "
    )                               // End Grouping
    

    Perl testcase(没有Java)

    $str = '
    string1=injunct! alter ego."
    string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
    ';
    
    print "\n'$str'\n";
    
    $str =~ s
    /
      (?:
         (
           "[a-zA-Z0-9\p{Punct}]
            [^"\n]*
            (?<=[a-zA-Z0-9\p{Punct}])
           "
         )
       |
           (?<![a-zA-Z0-9\p{Punct}])
           " 
         (  [^"\n]*  )
           " (?![a-zA-Z0-9\p{Punct}])
       |
           "
      )
    /$1$2/xg;
    
    print "\n'$str'\n";
    

    输出

    '
    string1=injunct! alter ego."
    string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
    '
    
    '
    string1=injunct! alter ego.
    string2=successor "alter ego" single employer "a" free proceeding "citation assets"
    '
    

答案 1 :(得分:1)

您可以使用类似(Perl表示法)的内容:

s/("(?=\S)[^"]*(?<=\S)")|"/$1/g;

在Java中将是:

str.replaceAll("(\"(?=\\S)[^\"]*(?<=\\S)\")|\"", "$1");