我想与大家分享这个相对聪明的问题。 我试图从字符串中删除不平衡/不成对的双引号。
我的工作正在进行中,我可能接近解决方案。但是,我还没有得到一个有效的解决方案。 我无法从字符串中删除未配对/未提交的双引号。
示例输入
string1=injunct! alter ego."
string2=successor "alter ego" single employer" "proceeding "citation assets"
输出
string1=injunct! alter ego.
string2=successor "alter ego" single employer proceeding "citation assets"
这个问题听起来很像 Using Java remove unbalanced/unpartnered parenthesis
到目前为止,这是我的代码(它不会删除所有无用的双引号)
private String removeUnattachedDoubleQuotes(String stringWithDoubleQuotes) {
String firstPass = "";
String openingQuotePattern = "\\\"[a-z0-9\\p{Punct}]";
String closingQuotePattern = "[a-z0-9\\p{Punct}]\\\"";
int doubleQuoteLevel = 0;
for (int i = 0; i < stringWithDoubleQuotes.length() - 3; i++) {
String c = stringWithDoubleQuotes.substring(i, i + 2);
if (c.matches(openingQuotePattern)) {
doubleQuoteLevel++;
firstPass += c;
}
else if (c.matches(closingQuotePattern)) {
if (doubleQuoteLevel > 0) {
doubleQuoteLevel--;
firstPass += c;
}
}
else {
firstPass += c;
}
}
String secondPass = "";
doubleQuoteLevel = 0;
for (int i = firstPass.length() - 1; i >= 0; i--) {
String c = stringWithDoubleQuotes.substring(i, i + 2);
if (c.matches(closingQuotePattern)) {
doubleQuoteLevel++;
secondPass = c + secondPass;
}
else if (c.matches(openingQuotePattern)) {
if (doubleQuoteLevel > 0) {
doubleQuoteLevel--;
secondPass = c + secondPass;
}
}
else {
secondPass = c + secondPass;
}
}
String result = secondPass;
return result;
}
答案 0 :(得分:2)
如果没有嵌套,可以在单个正则表达式中完成 有一个大致定义的分界面的概念,并且有可能“偏向” 这些规则可以获得更好的结果 这一切都取决于规定的规则。这个正则表达式考虑到了 三种可能的情况按顺序排列;
它也不会在行尾之外解析“”。但它确实做了多个
行组合为单个字符串。要更改它,请删除您看到的\n
。
全球背景 - 原始查找正则表达式
的缩短强>
(?:("[a-zA-Z0-9\p{Punct}][^"\n]*(?<=[a-zA-Z0-9\p{Punct}])")|(?<![a-zA-Z0-9\p{Punct}])"([^"\n]*)"(?![a-zA-Z0-9\p{Punct}])|")
替换分组
$1$2 or \1\2
扩展原始正则表达式:
(?: // Grouping
// Try to line up a valid pair
( // Capt grp (1) start
" // "
[a-zA-Z0-9\p{Punct}] // 1 of [a-zA-Z0-9\p{Punct}]
[^"\n]* // 0 or more non- [^"\n] characters
(?<=[a-zA-Z0-9\p{Punct}]) // 1 of [a-zA-Z0-9\p{Punct}] behind us
" // "
) // End capt grp (1)
| // OR, try to line up an invalid pair
(?<![a-zA-Z0-9\p{Punct}]) // Bias, not 1 of [a-zA-Z0-9\p{Punct}] behind us
" // "
( [^"\n]* ) // Capt grp (2) - 0 or more non- [^"\n] characters
" // "
(?![a-zA-Z0-9\p{Punct}]) // Bias, not 1 of [a-zA-Z0-9\p{Punct}] ahead of us
| // OR, this single " is considered invalid
" // "
) // End Grouping
Perl testcase(没有Java)
$str = '
string1=injunct! alter ego."
string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
';
print "\n'$str'\n";
$str =~ s
/
(?:
(
"[a-zA-Z0-9\p{Punct}]
[^"\n]*
(?<=[a-zA-Z0-9\p{Punct}])
"
)
|
(?<![a-zA-Z0-9\p{Punct}])
"
( [^"\n]* )
" (?![a-zA-Z0-9\p{Punct}])
|
"
)
/$1$2/xg;
print "\n'$str'\n";
输出
'
string1=injunct! alter ego."
string2=successor "alter ego" single employer "a" free" proceeding "citation assets"
'
'
string1=injunct! alter ego.
string2=successor "alter ego" single employer "a" free proceeding "citation assets"
'
答案 1 :(得分:1)
您可以使用类似(Perl表示法)的内容:
s/("(?=\S)[^"]*(?<=\S)")|"/$1/g;
在Java中将是:
str.replaceAll("(\"(?=\\S)[^\"]*(?<=\\S)\")|\"", "$1");