我希望仅使用不带引号的术语将|
替换为OR
,例如:
"this | that" | "the | other" -> "this | that" OR "the | other"
是的,我可以拆分空间或引用,得到一个数组并迭代它,并重建字符串,但这似乎......不优雅。所以也许有一种正则表达式方法可以通过计算"
之前的|
来实现这一点,显然奇数意味着|
被引用,甚至意味着不加引号。 (注意:如果至少有一个"
,则只有"
偶数才会开始处理。
答案 0 :(得分:11)
正则表达式无法计算,但是可以用于确定是否存在奇数或偶数。在这种情况下的技巧是检查管道之后的引号,而不是它之前。
str = str.replace(/\|(?=(?:(?:[^"]*"){2})*[^"]*$)/g, "OR");
打破这一点,(?:[^"]*"){2}
匹配下一对引号(如果有的话)以及介入的非引号。在您尽可能多地完成该操作后(可能为零),[^"]*$
会消耗任何剩余的非引号,直到字符串结束。
当然,这假设文本格式正确。它也没有解决转义引号的问题,但如果你需要它也可以。
答案 1 :(得分:5)
正则表达不计算在内。这就是解析器的用途。
答案 2 :(得分:4)
您可能会发现Perl FAQ on this issue相关。
#!/usr/bin/perl
use strict;
use warnings;
my $x = qq{"this | that" | "the | other"};
print join('" OR "', split /" \| "/, $x), "\n";
答案 3 :(得分:1)
您不需要计算,因为您不嵌套引号。这样做:
#!/usr/bin/perl
my $str = '" this \" | that" | "the | other" | "still | something | else"';
print "$str\n";
while($str =~ /^((?:[^"|\\]*|\\.|"(?:[^\\"]|\\.)*")*)\|/) {
$str =~ s/^((?:[^"|\\]*|\\.|"(?:[^\\"]|\\.)*")*)\|/$1OR/;
}
print "$str\n";
现在,让我们解释一下这个表达。
^ -- means you'll always match everything from the beginning of the string, otherwise
the match might start inside a quote, and break everything
(...)\| -- this means you'll match a certain pattern, followed by a |, which appears
escaped here; so when you replace it with $1OR, you keep everything, but
replace the |.
(?:...)* -- This is a non-matching group, which can be repeated multiple times; we
use a group here so we can repeat multiple times alternative patterns.
[^"|\\]* -- This is the first pattern. Anything that isn't a pipe, an escape character
or a quote.
\\. -- This is the second pattern. Basically, an escape character and anything
that follows it.
"(?:...)*" -- This is the third pattern. Open quote, followed by a another
non-matching group repeated multiple times, followed by a closing
quote.
[^\\"] -- This is the first pattern in the second non-matching group. It's anything
except an escape character or a quote.
\\. -- This is the second pattern in the second non-matching group. It's an
escape character and whatever follows it.
结果如下:
" this \" | that" | "the | other" | "still | something | else"
" this \" | that" OR "the | other" OR "still | something | else"
答案 4 :(得分:1)
另一种方法(类似于Alan M的工作答案):
str = str.replace(/(".+?"|\w+)\s*\|\s*/g, '$1 OR ');
第一组内部的部分(为便于阅读而间隔):
".+?" | \w+
...基本上是指,引用的东西或一个词。其余的意思是它后跟一个“|”包装在可选的空格中。替换是第一部分(“$ 1”表示第一组)后跟“OR”。
答案 5 :(得分:0)
也许你正在寻找这样的东西:
(?<=^([^"]*"[^"]*")+[^"|]*)\|
答案 6 :(得分:0)
谢谢大家。忽略提及这一点的道歉是javascript,并且不必引用条款,并且可以有任意数量的引用/未引用的术语,例如:
"this | that" | "the | other" | yet | another -> "this | that" OR "the | other" OR yet OR another
丹尼尔,似乎是在球场,即基本上是匹配/按摩循环。谢谢你的详细解释。在js中,它看起来像一个split,一个术语数组上的forEach循环,将一个术语(在将一个术语改为OR之后)推回一个数组,然后重新连接。
答案 7 :(得分:0)
@Alan M,运行良好,由于sqlite FTS功能稀疏而无需转义。
@epost,为简洁和优雅所接受的解决方案,谢谢。它只需要以更通用的形式用于unicode等。
(".+?"|[^\"\s]+)\s*\|\s*
答案 8 :(得分:0)
我在C#中的解决方案来计算引号,然后使用正则表达式来获取匹配项:
// Count the number of quotes.
var quotesOnly = Regex.Replace(searchText, @"[^""]", string.Empty);
var quoteCount = quotesOnly.Length;
if (quoteCount > 0)
{
// If the quote count is an odd number there's a missing quote.
// Assume a quote is missing from the end - executive decision.
if (quoteCount%2 == 1)
{
searchText += @"""";
}
// Get the matching groups of strings. Exclude the quotes themselves.
// e.g. The following line:
// "this and that" or then and "this or other"
// will result in the following groups:
// 1. "this and that"
// 2. "or"
// 3. "then"
// 4. "and"
// 5. "this or other"
var matches = Regex.Matches(searchText, @"([^\""]*)", RegexOptions.Singleline);
var list = new List<string>();
foreach (var match in matches.Cast<Match>())
{
var value = match.Groups[0].Value.Trim();
if (!string.IsNullOrEmpty(value))
{
list.Add(value);
}
}
// TODO: Do something with the list of strings.
}