REGEX在字符串中查找字符串

时间:2016-10-04 15:10:10

标签: c# regex

我似乎每年写一个Reg表达式并且最终总是寻求帮助。

这是一个字符串(它是来自Solr的搜索字符串),我想选择搜索字的每个实例。

这里是输入: -

http://server:8080/solr/app/select?q=(title_st_en%3Atheory+OR+title_st_ar%3Atheory+OR+title_st_da%3Atheory+OR+title_st_fr%3Atheory+OR+title_st_de%3Atheory+OR+title_st_it%3Atheory+OR+title_st_no%3Atheory+OR+title_st_sv%3Atheory+OR+title_st_ru%3Atheory+OR+title_st_es%3Atheory+OR+title_st_bg%3Atheory+OR+title_st_cs%3Atheory+OR+title_st_tr%3Atheory+OR+title_st_nl%3Atheory+OR+title_st_zh-cn%3Atheory+OR+title_st_zh-tw%3Atheory+OR+title_st_hr%3Atheory+OR+title_st_et%3Atheory+OR+title_st_he%3Atheory+OR+title_st_hu%3Atheory+OR+title_st_ja%3Atheory+OR+title_st_ko%3Atheory+OR+title_st_pl%3Atheory+OR+title_st_ro%3Atheory+OR+title_st_th%3Atheory+OR+title_st_vi%3Atheory+OR+content_stemming_en%3Atheory+OR+content_stemming_no%3Atheory+OR+(backfields%3Atheory))+AND+(((virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND+-(virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSF%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFMAG%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFRA%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_INTERNAL%5C%5CL%22+OR+virtualPath%3A

我需要在每个' %3A'之间选择任何文字。和' +OR'以及最终的' %3Atheory))' - 在这种情况下,单词' theory'但每次都会有一个不同的词 - 唯一已知的是' %3A'之间的任何alpha文字。和' +OR'。它需要停在' +AND+'

我已经到了/%3A(.*?)[+OR]/g - 这是一个开始,我猜...... 它没有找到' %3Atheory))'并且它不会停留在+AND+'

我正在努力找到这个'或者找到'以及在字符串处停止。

有人提供一些指导吗?

2 个答案:

答案 0 :(得分:0)

如果您正在使用,最好使用String.Split和Regex.Matches分割两个操作:

string input = @"http://server:8080/solr/app/select?q=(title_st_en%3Atheory+OR+title_st_ar%3Atheory+OR+title_st_da%3Atheory+OR+title_st_fr%3Atheory+OR+title_st_de%3Atheory+OR+title_st_it%3Atheory+OR+title_st_no%3Atheory+OR+title_st_sv%3Atheory+OR+title_st_ru%3Atheory+OR+title_st_es%3Atheory+OR+title_st_bg%3Atheory+OR+title_st_cs%3Atheory+OR+title_st_tr%3Atheory+OR+title_st_nl%3Atheory+OR+title_st_zh-cn%3Atheory+OR+title_st_zh-tw%3Atheory+OR+title_st_hr%3Atheory+OR+title_st_et%3Atheory+OR+title_st_he%3Atheory+OR+title_st_hu%3Atheory+OR+title_st_ja%3Atheory+OR+title_st_ko%3Atheory+OR+title_st_pl%3Atheory+OR+title_st_ro%3Atheory+OR+title_st_th%3Atheory+OR+title_st_vi%3Atheory+OR+content_stemming_en%3Atheory+OR+content_stemming_no%3Atheory+OR+(backfields%3Atheory))+AND+(((virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND+-(virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSF%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFMAG%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFRA%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_INTERNAL%5C%5CL%22+OR+virtualPath%3A";
Regex regex = new Regex(@"%3A(.*?)(?:\+OR|\)\))");

var splitted = input.Split(new[] { "AND" }, StringSplitOptions.None);
var matches = regex.Matches(splitted.First());

foreach (Match m in matches)
{
    // Or whatever you like to do with your matches
    Console.WriteLine(m.Groups[1].Value);
}

答案 1 :(得分:0)

Regex.Split可以选择保留分隔字符串。因此,对于问题中给出的文本,下面的代码会将其分成几部分:

string[] pieces = Regex.Split(theInputText, "(%3A.*?\\+(?:AND|OR))");
foreach (string ss in pieces)
{
    Console.WriteLine(ss);
}

以下是输出的一小部分:

+virtualPath
%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR
+virtualPath
%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND
+-(virtualPath
%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR
+virtualPath

将字符串拆分成碎片后,使用正确的起始和结束字符筛选数组元素应该是一件简单的事情,也可以找到最后的%3Atheory...条目。

注意:该问题讨论了+OR+AND+,但所有+OR后面都跟有+,因此最好包含最终+在表达式中,为...OR)\\+)

注意:正则表达式中的内括号是非捕获的,即(?: )。如果他们捕获括号,那么ANDOR捕获将包含在输出数组中。