我想在C#中使用Stanford NER来读取文件夹中的所有文件,并将结果输出到一个文件格式为"文件标记实体"
这就是我所拥有的:
namespace stanfordNER
{
class Program
{
public static CRFClassifier Classifier = CRFClassifier.getClassifierNoExceptions(@"english.all.3class.distsim.crf.ser.gz");
static void Main(string[] args)
{
Console.WriteLine("directory address?");
string dir = Console.ReadLine();
//Reads all files in directory
string[] files = System.IO.Directory.GetFiles(dir);
foreach (string f in files)
{
//Get the document name
string docNo = Path.GetFileName(Path.GetFullPath(f).TrimEnd(Path.DirectorySeparatorChar));
Console.WriteLine(docNo);
string docText = System.IO.File.ReadAllText(f);
var classified = Classifier.classifyFile(f).toArray();
//Error here when running
//Should output the entities,**this part is the work of Stewart Whiting (STEWH)
for (int i = 0; i < classified.Length; i++)
{
Triple triple = (Triple)classified[i];
int second = Convert.ToInt32(triple.second().ToString());
int third = Convert.ToInt32(triple.third().ToString());
Console.WriteLine(docNo + '\t' + triple.first().ToString() + '\t' + docText.Substring(second, third - second));
}
}
}
}
}
我在&#34; triple&#34;时收到了无效的强制转换异常错误。我不明白如何使用三重功能。
我想要的输出示例:
wiki-ms ORGANIZATION Microsoft Corporation
wiki-ms LOCATION Redmond
wiki-ms LOCATION Washington
wiki-ms ORGANIZATION Microsoft
wiki-ms ORGANIZATION Microsoft Office
wiki-ms ORGANIZATION Microsoft
wiki-ms PERSON Bill Gates
wiki-ms PERSON Paul Allen
wiki-ms ORGANIZATION Microsoft
wiki-ms ORGANIZATION Microsoft
&#13;
提前致谢!我是一名制造工程师,所以我的编程知识非常糟糕。
如果您有办法过滤重复项和/或类似实体,这将是一个额外的奖励!
感谢Stewart Whiting。 His Site
答案 0 :(得分:0)
我想出来了,只需改变
clang++
到
var classified = Classifier.classifyFile(f).toArray();
感谢。