我似乎每年写一个Reg表达式并且最终总是寻求帮助。
这是一个字符串(它是来自Solr的搜索字符串),我想选择搜索字的每个实例。
这里是输入: -
http://server:8080/solr/app/select?q=(title_st_en%3Atheory+OR+title_st_ar%3Atheory+OR+title_st_da%3Atheory+OR+title_st_fr%3Atheory+OR+title_st_de%3Atheory+OR+title_st_it%3Atheory+OR+title_st_no%3Atheory+OR+title_st_sv%3Atheory+OR+title_st_ru%3Atheory+OR+title_st_es%3Atheory+OR+title_st_bg%3Atheory+OR+title_st_cs%3Atheory+OR+title_st_tr%3Atheory+OR+title_st_nl%3Atheory+OR+title_st_zh-cn%3Atheory+OR+title_st_zh-tw%3Atheory+OR+title_st_hr%3Atheory+OR+title_st_et%3Atheory+OR+title_st_he%3Atheory+OR+title_st_hu%3Atheory+OR+title_st_ja%3Atheory+OR+title_st_ko%3Atheory+OR+title_st_pl%3Atheory+OR+title_st_ro%3Atheory+OR+title_st_th%3Atheory+OR+title_st_vi%3Atheory+OR+content_stemming_en%3Atheory+OR+content_stemming_no%3Atheory+OR+(backfields%3Atheory))+AND+(((virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND+-(virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSF%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFMAG%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFRA%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_INTERNAL%5C%5CL%22+OR+virtualPath%3A
我需要在每个' %3A
'之间选择任何文字。和' +OR
'以及最终的' %3Atheory
))' - 在这种情况下,单词' theory
'但每次都会有一个不同的词 - 唯一已知的是' %3A
'之间的任何alpha文字。和' +OR
'。它需要停在' +AND+
'
我已经到了/%3A(.*?)[+OR]/g
- 这是一个开始,我猜......
它没有找到' %3Atheory))
'并且它不会停留在+AND+
'
我正在努力找到这个'或者找到'以及在字符串处停止。
有人提供一些指导吗?
答案 0 :(得分:0)
如果您正在使用c#,最好使用String.Split和Regex.Matches分割两个操作:
string input = @"http://server:8080/solr/app/select?q=(title_st_en%3Atheory+OR+title_st_ar%3Atheory+OR+title_st_da%3Atheory+OR+title_st_fr%3Atheory+OR+title_st_de%3Atheory+OR+title_st_it%3Atheory+OR+title_st_no%3Atheory+OR+title_st_sv%3Atheory+OR+title_st_ru%3Atheory+OR+title_st_es%3Atheory+OR+title_st_bg%3Atheory+OR+title_st_cs%3Atheory+OR+title_st_tr%3Atheory+OR+title_st_nl%3Atheory+OR+title_st_zh-cn%3Atheory+OR+title_st_zh-tw%3Atheory+OR+title_st_hr%3Atheory+OR+title_st_et%3Atheory+OR+title_st_he%3Atheory+OR+title_st_hu%3Atheory+OR+title_st_ja%3Atheory+OR+title_st_ko%3Atheory+OR+title_st_pl%3Atheory+OR+title_st_ro%3Atheory+OR+title_st_th%3Atheory+OR+title_st_vi%3Atheory+OR+content_stemming_en%3Atheory+OR+content_stemming_no%3Atheory+OR+(backfields%3Atheory))+AND+(((virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND+-(virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSF%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFMAG%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFRA%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_INTERNAL%5C%5CL%22+OR+virtualPath%3A";
Regex regex = new Regex(@"%3A(.*?)(?:\+OR|\)\))");
var splitted = input.Split(new[] { "AND" }, StringSplitOptions.None);
var matches = regex.Matches(splitted.First());
foreach (Match m in matches)
{
// Or whatever you like to do with your matches
Console.WriteLine(m.Groups[1].Value);
}
答案 1 :(得分:0)
Regex.Split
可以选择保留分隔字符串。因此,对于问题中给出的文本,下面的代码会将其分成几部分:
string[] pieces = Regex.Split(theInputText, "(%3A.*?\\+(?:AND|OR))");
foreach (string ss in pieces)
{
Console.WriteLine(ss);
}
以下是输出的一小部分:
+virtualPath
%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR
+virtualPath
%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND
+-(virtualPath
%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR
+virtualPath
将字符串拆分成碎片后,使用正确的起始和结束字符筛选数组元素应该是一件简单的事情,也可以找到最后的%3Atheory...
条目。
注意:该问题讨论了+OR
和+AND+
,但所有+OR
后面都跟有+
,因此最好包含最终+
在表达式中,为...OR)\\+)
。
注意:正则表达式中的内括号是非捕获的,即(?: )
。如果他们捕获括号,那么AND
和OR
捕获将包含在输出数组中。