我正在努力做出一致意见。 我有一本字典,其中包含在文本中出现此单词的每个单词和频率。 现在我必须存储一些出现单词的行。 要做到这一点,我想制作一个存储每一行的容器。 像这样:
List<String> eachLine = new List<string>();
using (var strReader = new StreamReader(@"pathToFile/Text.txt"))
{
string line;
while ((line = strReader.ReadLine()) != null)
{
eachLine.Add(line);
}
}
这是词典
Dictionary<string, int> concordanceDictionary = new Dictionary<string, int>();
string lines = File.ReadAllText(path:Text.txt").ToLower();
string[] words = SplitWords(lines);
foreach (var word in words)
{
int i = 1;
if (!concordanceDictionary.ContainsKey(word))
{
concordanceDictionary.Add(word, i);
}
else
{
concordanceDictionary[word]++;
}
}
var list =concordanceDictionary.Keys.ToList();
list.Sort();
要存储行数,我将创建一个“列表”,我将在其中放置单词的索引,使用方法包含字典中的每个单词,这将检查该单词是否在
中 ' List<String> eachLine '
问题是如何用行数列表显示每个单词? 也许你可以建议我更优雅,更容易的方法来做到这一点
答案 0 :(得分:0)
我会使用Dictionary<String,List<Int32>>
,其中密钥为String
,这是当前单词,List<Int32>
是单词出现的行号列表。要获取事件计数,只需取消引用列表的Count
属性:dictionary[ word ].Count
。
顺便说一句,您不需要一次将所有内容都读入内存(如String[]
个实例)。您只需逐个字符地读取文件并识别空格和换行符。
为此,这是我的实施:
void Run() {
Dictionary< String, List<Int32> > dict = new Dictionary< String, List<Int32> >();
foreach(Tuple<String,Int32> wordOccurrence in GetWords()) {
String word = wordOccurrence.Item1;
Int32 line = wordOccurrence.Item2;
if( !dict.ContainsKey( word ) ) dict.Add( word , new List<Int32>() );
dict[ word ].Add( line );
}
foreach(String word in dict.Keys) {
Console.WriteLine("\"{0}\" appeared {1} times, on these lines:", word, dict[word].Count);
foreach(Int32 line in dict[word]) Console.WriteLine("\t{0}", line );
Console.WriteLine("");
}
}
IEnumerable< Tuple<String,Int32> > GetWords() {
using(StreamReader rdr = new StreamReader("fileName")) {
StringBuilder sb = new StringBuilder();
Int32 nc; Char c;
Itn32 lineNumber = 0;
while( (nc = rdr.Read() != -1 ) {
c = (Char)nc;
if( Char.IsWhitespace(c) ) {
if( sb.Length > 0 ) {
yield return new Tuple( sb.ToString(), lineNumber );
sb.Length = 0;
}
if( c == '\n' ) lineNumber++;
} else {
sb.Append( c );
}
}
if( sb.Length > 0 ) yield return new Tuple( sb.ToString(), lineNumber );
}
}
答案 1 :(得分:0)
创建一个控制台应用程序供您运行
class Program
{
static void Main(string[] args)
{
ReadTextToDictionary read = new ReadTextToDictionary();
var strings = read.TextToListString(@"C:\stackoverflow\first.txt");
var dictionarys = read.TextToDictionaryString(@"C:\stackoverflow\second.txt");
foreach(var s in strings) {
var compare = dictionarys.Where(a=>a.Value.Contains(s.ToString()));
foreach(var f in compare)
{
Console.WriteLine(s+" in line "+f.Key.ToString() + " " + f.Value);
}
}
Console.ReadKey();
}
}
class ReadTextToDictionary
{
public List<string> TextToListString(string path){
var lines = System.IO.File.ReadAllLines(path);
return lines.ToList();
}
public Dictionary<int,string> TextToDictionaryString(string path)
{
Dictionary<int, string> dstr = new Dictionary<int, string>();
var lines = System.IO.File.ReadAllLines(path);
int count = 0;
foreach (var s in lines)
{
count++;
dstr.Add(count, s);
}
return dstr;
}
}
答案 2 :(得分:0)
一种方法是将单词出现在列表中的每一行存储,作为字典的值部分,并将单词作为键。
换句话说,你会有一个Dictionary<string, List<string>>
,其中键是一个单词,相关的列表是包含单词的所有行。
通过这种方式,您可以快速访问这些行并获得免费的出现次数(dict[someWord].Count;
)
例如:
// words dictionary has a word key and a list of lines containing the word
var words = new Dictionary<string, List<string>>();
using (var strReader = new StreamReader(@"pathToFile/Text.txt"))
{
string line;
// Read each line
while ((line = strReader.ReadLine()) != null)
{
// Get each word from the line
var wordsInLine = line.ToLower().Split(' ');
foreach (var word in wordsInLine)
{
// If this word already exists, update it's line number
if (words.ContainsKey(word))
{
words[word].Add(line);
}
// Otherwise, add a new word with this line number to the list
else
{
words.Add(word, new List<string> {line});
}
}
}
}
如果你真的想要获得所有的行,你可以将它们添加到上面循环中的列表中,或者执行以下操作:
var allLines = words.SelectMany(w => w.Value).Distinct().ToList();