目前我试图创建一个应用程序来做一些文本处理来读取文本文件,然后我使用字典来创建单词索引,从技术上讲它将是这样的..程序将运行并读取文本文件然后检查它,看看该单词是否已经存在于该文件中,以及它作为唯一单词的id字。如果是这样,它将打印出他们遇到的每个单词的索引号和外观总数,并继续检查整个文件。并产生这样的东西:http://pastebin.com/CjtcYchF
以下是我正在输入的文本文件的示例:http://pastebin.com/ZRVbhWhV快速ctrl-F显示“not”发生2次,“that”发生4次。我需要做的是为每个单词编制索引并将其命名为:
sample input : "that I have not that place sunrise beach like not good dirty beach trash beach"
dictionary : output.txt / output.dat:
index word
1 I 4:2 1:1 2:1 3:2 5:1 6:1 7:3 8:1 9:1 10:1 11:1
2 have
3 not
4 that
5 place
6 sunrise
7 beach
8 like
9 good
10 dirty
11 trash
我试图实现一些代码来创建字典。以下是我到目前为止的情况:
private void bagofword_Click(object sender, EventArgs e)
{
//creating dictionary in background
//Dictionary<string, int> dict = new Dictionary<string, int>();
string rawinputbow = File.ReadAllText(textBox31.Text);
//string[] inputbow = rawinputbow.Split(' ');
var inputbow = rawinputbow.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.ToList();
var dict = new OrderedDictionary();
var output = new List<int>();
foreach (var element in inputbow.Select((word, index) => new { word, index }))
{
if (dict.Contains(element.word))
{
var count = (int)dict[element.word];
dict[element.word] = ++count;
output.Add(GetIndex(dict, element.word));
//textBoxfile.Text = output.ToString();
// textBoxfile.Text = inputbow.ToString();
string result = string.Join(",", output);
textBoxfile.Text = result.ToString();
}
else
{
dict[element.word] = 1;
output.Add(GetIndex(dict, element.word));
//textBoxfile.Text = dict.ToString();
string result = string.Join(",", output);
textBoxfile.Text = result.ToString();
}
}
}
public int GetIndex(OrderedDictionary dictionary, string key)
{
for (int index = 0; index < dictionary.Count; index++)
{
if (dictionary[index] == dictionary[key])
return index; // We found the item
//textBoxfile.Text = index.ToString();
}
return -1;
}
有谁知道如何填写该代码?非常感谢任何帮助!
答案 0 :(得分:2)
在空格上分裂是不够的。您有temple,
photos.
或cafes/restaraunts
等字词。更好的方法是使用像\w+
这样的正则表达式。这些词也应该以不区分大小写的方式进行比较。
我的方法是:
var words = Regex.Matches(File.ReadAllText(filename), @"\w+").Cast<Match>()
.Select((m, pos) => new { Word = m.Value, Pos = pos })
.GroupBy(s => s.Word, StringComparer.CurrentCultureIgnoreCase)
.Select(g => new { Word = g.Key, PosInText = g.Select(z => z.Pos).ToList() })
.ToList();
foreach(var item in words)
{
Console.WriteLine("{0,-15} POS:{1}", item.Word, string.Join(",", item.PosInText));
}
for (int i = 0; i < words.Count; i++)
{
Console.Write("{0}:{1} ", i, words[i].PosInText.Count);
}
答案 1 :(得分:0)
BETWEEN
答案 2 :(得分:-1)
使用此代码
string input = "that I have not that place sunrise beach like not good dirty beach trash beach";
var wrodList = input.Split(null);
var output = wrodList.GroupBy(x => x).Select(x => new Word { charchter = x.Key, repeat = x.Count() }).OrderBy(x=>x.repeat);
foreach (var item in output)
{
textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
}
保存数据的类
public class word
{
public string charchter { get; set; }
public int repeat { get; set; }
}