从nlp数据构建一个protege框架

时间:2017-01-26 10:48:54

标签: java stanford-nlp ontology protege information-extraction

我在尝试从斯坦福NLP注释的句子中提取信息时遇到了一些问题。我是NLP图书馆的新手,作为protege! 这是我正在解析的句子: String myTranslation ="来自急诊室的患者已经通过garat(PS18,PEEP5,FiO2 0.6)通气,住院治疗。监测血压77/40,心率113和不可检测的SPO2。在真空中通气,然后在给药后进行气管内插管:咪达唑仑8mg,Fentanest 100,Nimbex 10mg。管直径7,5不带孔。从TET分泌的Aspire,放入机械通气VT 450ml,FR 14,FiO2 0.6。继续超声引导右颈内静脉插管并要求进行胸部X光检查&#34 ;;

这是我现在使用的算法:

List<CoreMap> sentences = processor.getAnnotatedSentences(text);
    if (sentences != null && ! sentences.isEmpty()) {
        for(CoreMap sentence : sentences)
        {
            System.out.println("The first sentence is:");
            System.out.println(sentence.toShorterString())
            SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
            for(IndexedWord iWord: sg.vertexListSorted())
            {
                String value = iWord.originalText();
                String pos = iWord.get(CoreAnnotations.PartOfSpeechAnnotation.class);
               //In the creation of an entity frame I consider only verbs and nouns
                if(!(pos.startsWith("NN")||pos.startsWith("VB")))
                    continue;
                //Here I check if the term I am analyzing is a medical term (checking with the UMLS methathesaurus)
                List<Entity> entities = matcher.getEntities(value);
                Set<String> set = null;
                //for (Ev ev: entities.get(0).getEvSet()) {
                //   set = el.getSemanticTypeSet(ev.getConceptInfo().getCUI());
                //}
                //String semanticType = set.toString();
                if(entities.isEmpty())
                    continue;
                else
                {
                   //then I collect the typed dependency
                    Collection<TypedDependency> deps = sg.typedDependencies();
                    for(TypedDependency dep: deps)
                    {
                        System.out.println(dep.toString());
                    }
                    Frame frame = new Frame(value,"",pos);
                    //And fetch the other words I think I should relate to the one I am analyizing : for instance, if I am analyzing patient, looking at the sentence, I should associate with it: emergency room, ventilated, garat(...)
                    Set<IndexedWord> descWords = sg.descendants(iWord);
                    //Here i try to merge compound words like emergency and room into a single one
                    addCompoundNamesToFrame("compound", frame, descWords, deps);
                    //addCompoundNamesToFrame("nummod", frame, descWords, deps);
                    for(IndexedWord desc : descWords)
                    {
                        if(desc.originalText().equals(value))
                            continue;
                        //"Stopword removal"
                        if(!filter(desc))
                        {
                            /*List<Entity> subEntities = matcher.getEntities(desc.word());
                            if(subEntities.isEmpty())
                                continue;*/
                        // I add the extracted informations to the frame
                            frame.addInfo(desc.originalText(), null);
                        }
                    }
                    System.out.println("Frame:"+frame);
                }

             }
        }
    }
private void addCompoundNamesToFrame(String relName, Frame frame, Set<IndexedWord> descWords, Collection<TypedDependency> deps)
{
    for(TypedDependency dep : deps)
    {
        if(dep.reln().getShortName().equals(relName))
        {
            IndexedWord d = dep.dep();
            IndexedWord g = dep.gov();
            if(!filter(g)&&!filter(d))
            {
                frame.addInfo(d.originalText()+" "+g.originalText(), null);
            }
            descWords.remove(d);
            descWords.remove(g);
        }
    }
}

现在,我用第一句话输出显示我的问题: 框架:术语:患者,[信息:急诊室,信息:来,信息:通风,信息:garat,信息:(,信息:PS18,信息:PEEP5,信息:FiO2,信息:0.6,信息:)] 框架:术语:紧急,[信息:急诊室]

我的第一个问题是我想把garat(PS18,PEEP5,FiO2 0.6)作为一个单词来看待依赖我不知道如何做到这一点。

现在,对于第二句话,我得到了这些: 框架:期限:插管,[信息:气管] 框架:术语:管理,[信息::信息:Midazolan,日期:,信息:8mg,信息:Fentanest,信息:100,信息:Nimbex,信息:10mg] 框架:期限:FENTANEST,[信息:100] 帧:期限:Nimbex,[信息:10毫克]

从这些依赖项:

 BasicDependencies=-> Ventilated/VBN (root)
  -> vacuum/NN (nmod)
    -> in/IN (case)
  -> and/CC (cc)
  -> proceeded/VBD (conj)
    -> then/RB (advmod)
    -> intubation/NN (nmod)
      -> to/TO (case)
      -> endotracheal/JJ (amod)
    -> administration/NN (nmod)
      -> after/IN (case)
      -> 8mg/NN (nmod)
        -> of/IN (case)
        -> :/: (punct)
        -> Midazolan/JJ (amod)
        -> ,/, (punct)
        -> Fentanest/NN (conj)
          -> 100/CD (nummod)
        -> ,/, (punct)
        -> Nimbex/NNP (conj)
          -> 10mg/CD (nummod)
  -> ./. (punct)

当我想得到的是: 框架:术语:插管,[信息:气管插管,信息:咪达唑仑8mg,信息:Fentantest 100,Nimbex 10mg] 因此,我想将药物的名称与剂量结合起来,并将它们视为适当词语的重要术语! 另外,我组合复合词的方式很好,但有时我需要将复合词与nummod结合起来,我不知道如何重新处理单词,再将它作为一个单词添加到集合中,以添加有关它的更多信息。 例如: 框架:术语:压力,[信息:血压,信息:心率,信息:77/40,日期:,信息:113,信息:和,信息:不可检测,信息:SPO2,信息:监控,] bool压力是两个令牌bool和压力的组合,在我合并它们以在我的框架中添加单个信息后,我需要重新处理它以添加与其相关的77/40 nummod信息,但不知道怎么样!

现在,我确实意识到这根本不是微不足道的。我向您展示了我的示例,以便您能够理解我的问题,我不会要求您完全解决我的问题,但即使是一些参考或示例或信息也可能非常有用!

0 个答案:

没有答案