Question

问题非常类似于：

Check a string for containing a list of substrings

Check if a string contains a list of substrings and save the matching ones

有一个例外 - 不仅要检查，还要获取子串的起始索引以供将来处理。使用List字符串看起来像IndexOf：

private List<string> matches = new List<string> { "one", "two", "three" };

while (index < text.Length && -1 != (index = text.IndexOf(matches, index))) 
{                       
   ...
   // also I need to identify which one of substrings has been matched
   index += matches[?].Length;
   // further text processing...
}

换句话说，我需要知道文本字符串是否包含列表中的任何子字符串（不是单词！），如果是，则获取匹配子字符串的开始和结束位置。

P.S：此外，此方法必须足够快且不区分大小写。

Answer 1

以下是使用匹配关键字获取索引的LINQ方法：

var matches = new List<string> { "one", "two", "three" };
var result = matches.Where(i => s.IndexOf(i, StringComparison.OrdinalIgnoreCase) > -1)
           .ToDictionary(m => s.IndexOf(m, StringComparison.OrdinalIgnoreCase), m => m);

使用StringComparison.OrdinalIgnoreCase，我们确保不区分大小写的比较检查。

非LINQ方式：

List<string> matches = new List<string> { "one", "two", "three" };
for (int h = 0; h < matches.Count; h++)
{
    int idx = s.IndexOf(matches[h], StringComparison.OrdinalIgnoreCase);
    if (idx > -1)
        Console.WriteLine(string.Format("Index: {0}, value: {1}",idx, matches[h]));
 }

这是一种正则表达式，用于在输入字符串中获取匹配字典及其索引：

List<string> matches = new List<string> { "one", "two", "three" };
matches = matches.Select(p => Regex.Escape(p)).ToList();
string s = "one and two and three";
var dict = Regex.Matches(s, string.Join("|", matches), RegexOptions.IgnoreCase).Cast<Match>()
                .ToDictionary(m => m.Index, m => m.Value);

结果：

您需要使用Match.Index来获取字符串中匹配项的索引，但要确保您的正则表达式模式有效，Regex.Escape可能会有所帮助（因为您可能? }或搜索词中的其他正则表达式特殊字符。）

RegexOptions.IgnoreCase标志将确保不区分大小写的匹配。

使用<list>子字符串进行字符串搜索

1 个答案: