我想搜索一组特定的单词(或者现在是一个单词),这是“Jude”这是我当前的代码,我可以读取文件,它将单词分开,但只是将它们与一个单词进行比较是问题。 (目前它被装配到只计算单词并且输出正确)。
非常感谢 -Fred
String theLine;
string theFile;
int counter = 0;
string[] fields = null;
string delim = " ,.";
Console.WriteLine("Please enter a filename:");
theFile = Console.ReadLine();
System.IO.StreamReader sr =
new System.IO.StreamReader(theFile);
while (!sr.EndOfStream)
{
theLine = sr.ReadLine();
theLine.Trim();
fields = theLine.Split(delim.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
counter += fields.Length;
}
sr.Close();
Console.WriteLine("The word count is {0}", counter);
Console.ReadLine();
}
答案 0 :(得分:2)
使用LINQ,您可以枚举文件的行,然后计算每行中单词或单词的出现次数,并将计数加在一起:
Console.WriteLine("Please enter a filename:");
var theFile = Console.ReadLine();
var delim = " ,.".ToCharArray();
var countWords = new HashSet(new[] { "Jude" }.Select(w => w.ToUpperInvariant()));
var count = File.ReadLines(theFile).Select(l => l.Split(delim, StringSplitOptions.RemoveEmptyEntries).Count(w => countWords.Contains(w.ToUpperInvariant()))).Sum();
Console.WriteLine("The word count is {0}", count);
如果您更喜欢@Dai的正则表达式模式方法,您可以使用它来计算每一行中的出现次数,仍然使用LINQ处理这些行并对计数求和:
Console.WriteLine("Please enter a filename:");
var theFile = Console.ReadLine();
var delim = " ,.".ToCharArray();
var countWords = new[] { "Jude" };
var wordPattern = new Regex(@"\b(?:"+String.Join("|", countWords)+@")\b", RegexOptions.Compiled|RegexOptions.IgnoreCase);
var count = File.ReadLines(theFile).Select(l => wordPattern.Matches(l).Count).Sum();
Console.WriteLine("The word count is {0}", count);
答案 1 :(得分:1)
String.Split()
,因为它会导致多余的字符串分配ToCharArray()
- 您只需缓存结果。using()
确保始终处置IDisposable
个对象。我建议改用正则表达式:
Regex regex = new Regex( @"\bJude\b", RegexOptions.Compiled | RegexOptions.IgnoreCase );
Int32 count = 0;
using( StreamReader rdr = new StreamReader( theFile ) )
{
String line;
while( ( line = rdr.ReadLine() ) != null )
{
count += regex.Matches( line ).Count;
}
}
\b
转义符合"字边界",例如字符串和标点符号的开头和结尾,因此它将匹配" Jude"在以下示例中:"Jude"
,"Jude foo"
,"Foo Jude"
,"Hello. Jude."
但不是"JudeJude"
。