输出"文件 - 令牌 - 实体"使用斯坦福NER

时间:2015-07-24 16:44:05

标签: c# stanford-nlp named-entity-recognition

我想在C#中使用Stanford NER来读取文件夹中的所有文件,并将结果输出到一个文件格式为"文件标记实体"

这就是我所拥有的:

namespace stanfordNER
{
    class Program
    {
        public static CRFClassifier Classifier = CRFClassifier.getClassifierNoExceptions(@"english.all.3class.distsim.crf.ser.gz");

        static void Main(string[] args)
        {
            Console.WriteLine("directory address?");
            string dir = Console.ReadLine();

            //Reads all files in directory
            string[] files = System.IO.Directory.GetFiles(dir);
            foreach (string f in files)
            {
                //Get the document name
                string docNo = Path.GetFileName(Path.GetFullPath(f).TrimEnd(Path.DirectorySeparatorChar));
                Console.WriteLine(docNo);

                string docText = System.IO.File.ReadAllText(f); 

                var classified = Classifier.classifyFile(f).toArray();

                //Error here when running
                //Should output the entities,**this part is the work of Stewart Whiting (STEWH)
                for (int i = 0; i < classified.Length; i++)
                {
                    Triple triple = (Triple)classified[i];

                    int second = Convert.ToInt32(triple.second().ToString());
                    int third = Convert.ToInt32(triple.third().ToString());

                    Console.WriteLine(docNo + '\t' + triple.first().ToString() + '\t' +                              docText.Substring(second, third - second));
                }
            }
        }
    }
}

我在&#34; triple&#34;时收到了无效的强制转换异常错误。我不明白如何使用三重功能。

我想要的输出示例:

&#13;
&#13;
wiki-ms      ORGANIZATION    Microsoft Corporation
wiki-ms      LOCATION        Redmond
wiki-ms      LOCATION        Washington
wiki-ms      ORGANIZATION    Microsoft
wiki-ms      ORGANIZATION    Microsoft Office
wiki-ms      ORGANIZATION    Microsoft
wiki-ms      PERSON          Bill Gates
wiki-ms      PERSON          Paul Allen
wiki-ms      ORGANIZATION    Microsoft
wiki-ms      ORGANIZATION    Microsoft
&#13;
&#13;
&#13;

提前致谢!我是一名制造工程师,所以我的编程知识非常糟糕。

如果您有办法过滤重复项和/或类似实体,这将是一个额外的奖励!

感谢Stewart Whiting。 His Site

1 个答案:

答案 0 :(得分:0)

我想出来了,只需改变

clang++

var classified = Classifier.classifyFile(f).toArray();

感谢。