如何计算字符串内的子字符串数

时间:2016-11-24 06:12:48

标签: c# string linq

我正在尝试读取文本文件并计算某个字符串出现的次数。这就是我到目前为止所做的:

System.IO.StreamReader file = new System.IO.StreamReader("C:\\Users\\Test\\Documents\\Sample.txt");
while ((line = file.ReadLine()) != null) {
    Console.WriteLine(line);

    counter = Regex.Matches(line, "the", RegexOptions.IgnoreCase).Count;
}

Console.WriteLine(counter);

file.Close();

// Suspend the screen.
Console.ReadLine();

所以我想找到包含字符串“the”的所有单词,但是我没有得到正确的计数。我希望它也像“枯萎”等词一样计算“the”,而不仅仅是“the”这个词。我发现的问题是,当txt文件包含不同的段落和空格时,它会错过单词。当我在段落之间没有空格时,它似乎有效。我可以做些什么来解决这个问题?

我的意思是段落空格:

Sample text Sample text  Sample text  Sample text Sample text.

Sample text Sample text Sample text Sample text Sample text .

但如果我像这样将它们组合起来就可以了:

Sample text Sample text  Sample text  Sample text Sample text.Sample text Sample text  Sample text  Sample text Sample text.

5 个答案:

答案 0 :(得分:2)

您需要增加计数而不是每次都设置

System.IO.StreamReader file = new System.IO.StreamReader("C:\\Users\\Test\\Documents\\Sample.txt");
while ((line = file.ReadLine()) != null)
{
     Console.WriteLine(line);
     //increment count instead of setting it everytime
     counter += Regex.Matches(line, "the", RegexOptions.IgnoreCase).Count; 
}
Console.WriteLine(counter);
file.Close();
// Suspend the screen.
Console.ReadLine();

答案 1 :(得分:1)

如果要显示每行的计数,则意味着必须将Console.WriteLine(counter);移动到while的边界。

string searchStr= "the";
while ((line = file.ReadLine()) != null)
{
    Console.WriteLine(line);
    counter = Regex.Matches(line,searchStr, RegexOptions.IgnoreCase).Count;
    Console.WriteLine("Count of {0} in this line is {1}",searchStr,counter);
}

如果您在while的每次迭代中更新counter,则可以显示搜索词的完整计数。

string searchStr= "the";
 while ((line = file.ReadLine()) != null)
 {
     Console.WriteLine(line);
     counter += Regex.Matches(line, searchStr , RegexOptions.IgnoreCase).Count;
 }
 Console.WriteLine("Occurance of {0} in this document is {1}",searchStr,counter);

更新:要获取包含特定字词的所有字词并计算给定内容中搜索字符串的总出现次数,您可以使用列表  如下:

 string searchStr= "the";
 List<string> totalMatchStrings = new List<string>();
 while ((line = file.ReadLine()) != null)
 {
     totalMatchStrings.AddRange(lineInput.Split(' ').Where(x => x.ToLower().Contains(searchString)));         
 }
 string matchingWords = String.Join(",", totalMatchStrings.Distinct());
 Console.WriteLine("Occurance of {0} in this document is {1}",searchStr,totalMatchStrings.Count);
 Console.WriteLine("matching words are : {0}",matchingWords );

答案 2 :(得分:1)

var allLines = File.ReadAllLines(@"C:\POC\input.txt");
var theCount = allLines.SelectMany(l => l.Split(' '))
        .Where(l => l.ToLower().Contains("the"))
        .Count();

答案 3 :(得分:0)

如果您使用的是.NET 3.5,可以使用LINQ在单行中执行此操作:

int count = line.Count(f => f == 'the');

答案 4 :(得分:0)

当逐行读取时,您可以在循环内使用以下代码,并逐行添加计数。

Regex.Matches( Regex.Escape(input),  "the", RegexOptions.IgnoreCase).Count