我正在研究一个问题,我必须能够读取文本文件,并计算特定单词的频率和行号。
例如,一个读取
的txt文件"Hi my name is
Bob. This is
Cool"
应该返回:
1 Hi 1
1 my 1
1 name 1
2 is 1 2
1 bob 2
1 this 2
1 cool 3
我无法确定如何存储行号以及单词频率。我尝试了一些不同的东西,到目前为止,这就是我所处的位置。
任何帮助?
Dictionary<string, int> countDictionary = new Dictionary<string,int>();
Dictionary<string, List<int>> lineDictionary = new Dictionary<string, List<int>>();
List<string> lines = new List<string>();
System.IO.StreamReader file =
new System.IO.StreamReader("Sample.txt");
//Creates a List of lines
string x;
while ((x = file.ReadLine()) != null)
{
lines.Add(x);
}
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
if(!countDictionary.Keys.Contains(word.ToLower()) && !lineDictionary.Keys.Contains(word.ToLower()))
{
countDictionary.Add(word.ToLower(), 1);
//lineDictionary.Add(word.ToLower(), /*what to put here*/);
}
else
{
countDictionary[word] += 1;
//ADD line to dictionary???
}
}
}
foreach (var pair in countDictionary)//WHAT TO PUT HERE to print both
{
Console.WriteLine("{0} {1}", pair.Value, pair.Key);
}
file.Close();
System.Console.ReadLine();
答案 0 :(得分:3)
你可以用一行linq
来做这件事var processed =
//get the lines of text as IEnumerable<string>
File.ReadLines(@"myFilePath.txt")
//get a word and a line number for every word
//so you'll have a sequence of objects with 2 properties
//word and lineNumber
.SelectMany((line, lineNumber) => line.Split().Select(word => new{word, lineNumber}))
//group these objects by their "word" property
.GroupBy(x => x.word)
//select what you need
.Select(g => new{
//number of objects in the group
//i.e. the frequency of the word
Count = g.Count(),
//the actual word
Word = g.Key,
//a sequence of line numbers of each instance of the
//word in the group
Positions = g.Select(x => x.lineNumber)});
foreach(var entry in processed)
{
Console.WriteLine("{0} {1} {2}",
entry.Count,
entry.Word,
string.Join(" ",entry.Positions));
}
我喜欢基于0的计数,因此您可能希望在适当的位置添加1。
答案 1 :(得分:1)
您正在两个独立的数据结构中跟踪实体“word”的两个不同属性。我建议创建一个表示该实体的类,如
public class WordStats
{
public string Word { get; set; }
public int Count { get; set; }
public List<int> AppearsInLines { get; set; }
public Word()
{
AppearsInLines = new List<int>();
}
}
然后跟踪
中的内容Dictionary<string, WordStats> wordStats = new Dictionary<string, WordStats>();
使用单词本身作为键。遇到新单词时,请检查是否已存在具有该特定键的Word实例。如果是这样,获取它并更新Count和AppearsInLines属性;如果没有创建新实例并将其添加到字典中。
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
WordStats wordStat;
bool alreadyHave = words.TryGetValue(word, out wordStat);
if (alreadyHave)
{
wordStat.Count++;
wordStat.AppearsInLines.Add(y);
}
else
{
wordStat = new WordStats();
wordStat.Count = 1;
wordStat.AppearsInLines.Add(y);
wordStats.Add(word, wordStat);
}