我正在尝试计算文本文件中每个单词的出现次数(不区分大小写)并将单词及其计数存储在列表中。
这是我要存储在列表中的每个单词的对象类,
public class WordItem
{
public string Word { get; set; }
public int Count { get; set; }
}
和我的代码函数解析文本文件
public List<WordItem> FindWordCount()
{
//I've successfully parsed the text file into a list
//of words and stripped punctuation up to this point
//and stored them in List<string> wordlist.
List<string> wordlist;
List<WordEntry> entries = new List<WordEntry>();
foreach (string word in wordlist)
{
WordItem temp = new WordItem();
temp.Word = word;
temp.Count = 1;
entries.Add(temp);
}
}
我如何编辑单词计数功能以防止列表中出现重复单词,而是每当我再找一个单词时增加计数值?
答案 0 :(得分:7)
我会使用Dictionary
和不区分大小写的字符串比较器:
public IEnumerable<WordItem> FindWordCount(IEnumerable<string> wordlist)
{
var wordCount = new Dictionary<string, int>(StringComparer.CurrentCultureIgnoreCase);
foreach (string word in wordlist)
{
int count = 0;
bool contained = wordCount.TryGetValue(word, out count);
count++;
wordCount[word] = count;
}
foreach (var kv in wordCount)
yield return new WordItem { Word = kv.Key, Count = kv.Value };
}
您可以这样使用它:
var wordList = new string[] { "A", "a", "b", "C", "a", "b" };
var wordCounts = FindWordCount(wordList).ToList();
答案 1 :(得分:0)
还有很好的单线解决方案:
IEnumerable<WordItem> countedList = wordlist.Distinct().Select(word => new WordItem() { Word = word, Count = wordlist.Count(compWord => word.Equals(compWord, StringComparison.InvariantCultureIgnoreCase)) });
或者,如果您更喜欢字典,以便以后能够搜索特定单词:
Dictionary<string, int> dictionary = wordlist.Distinct().ToDictionary<string, string, int>(word => word, word => wordlist.Count(compWord => word.Equals(compWord, StringComparison.InvariantCultureIgnoreCase)));
性能当然比Tim Smelters解决方案少一点,因为Count() - Call(导致O(n^2)
)但使用C# 6.0
你可以写下方法使用lambda表达式来定义而不是正文。
答案 2 :(得分:0)
简单且适合您的类型:
public string[] wordList;
public class WordItem
{
public string Word { get; set; }
public int Count { get; set; }
}
public IEnumerable<WordItem> FindWordCount()
{
return from word in wordList
group word by word.ToLowerInvariant() into g
select new WordItem { Word = g.Key, Count = g.Count()};
}