我在尝试从斯坦福NLP注释的句子中提取信息时遇到了一些问题。我是NLP图书馆的新手,作为protege! 这是我正在解析的句子: String myTranslation ="来自急诊室的患者已经通过garat(PS18,PEEP5,FiO2 0.6)通气,住院治疗。监测血压77/40,心率113和不可检测的SPO2。在真空中通气,然后在给药后进行气管内插管:咪达唑仑8mg,Fentanest 100,Nimbex 10mg。管直径7,5不带孔。从TET分泌的Aspire,放入机械通气VT 450ml,FR 14,FiO2 0.6。继续超声引导右颈内静脉插管并要求进行胸部X光检查&#34 ;;
这是我现在使用的算法:
List<CoreMap> sentences = processor.getAnnotatedSentences(text);
if (sentences != null && ! sentences.isEmpty()) {
for(CoreMap sentence : sentences)
{
System.out.println("The first sentence is:");
System.out.println(sentence.toShorterString())
SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
for(IndexedWord iWord: sg.vertexListSorted())
{
String value = iWord.originalText();
String pos = iWord.get(CoreAnnotations.PartOfSpeechAnnotation.class);
//In the creation of an entity frame I consider only verbs and nouns
if(!(pos.startsWith("NN")||pos.startsWith("VB")))
continue;
//Here I check if the term I am analyzing is a medical term (checking with the UMLS methathesaurus)
List<Entity> entities = matcher.getEntities(value);
Set<String> set = null;
//for (Ev ev: entities.get(0).getEvSet()) {
// set = el.getSemanticTypeSet(ev.getConceptInfo().getCUI());
//}
//String semanticType = set.toString();
if(entities.isEmpty())
continue;
else
{
//then I collect the typed dependency
Collection<TypedDependency> deps = sg.typedDependencies();
for(TypedDependency dep: deps)
{
System.out.println(dep.toString());
}
Frame frame = new Frame(value,"",pos);
//And fetch the other words I think I should relate to the one I am analyizing : for instance, if I am analyzing patient, looking at the sentence, I should associate with it: emergency room, ventilated, garat(...)
Set<IndexedWord> descWords = sg.descendants(iWord);
//Here i try to merge compound words like emergency and room into a single one
addCompoundNamesToFrame("compound", frame, descWords, deps);
//addCompoundNamesToFrame("nummod", frame, descWords, deps);
for(IndexedWord desc : descWords)
{
if(desc.originalText().equals(value))
continue;
//"Stopword removal"
if(!filter(desc))
{
/*List<Entity> subEntities = matcher.getEntities(desc.word());
if(subEntities.isEmpty())
continue;*/
// I add the extracted informations to the frame
frame.addInfo(desc.originalText(), null);
}
}
System.out.println("Frame:"+frame);
}
}
}
}
private void addCompoundNamesToFrame(String relName, Frame frame, Set<IndexedWord> descWords, Collection<TypedDependency> deps)
{
for(TypedDependency dep : deps)
{
if(dep.reln().getShortName().equals(relName))
{
IndexedWord d = dep.dep();
IndexedWord g = dep.gov();
if(!filter(g)&&!filter(d))
{
frame.addInfo(d.originalText()+" "+g.originalText(), null);
}
descWords.remove(d);
descWords.remove(g);
}
}
}
现在,我用第一句话输出显示我的问题: 框架:术语:患者,[信息:急诊室,信息:来,信息:通风,信息:garat,信息:(,信息:PS18,信息:PEEP5,信息:FiO2,信息:0.6,信息:)] 框架:术语:紧急,[信息:急诊室]
我的第一个问题是我想把garat(PS18,PEEP5,FiO2 0.6)作为一个单词来看待依赖我不知道如何做到这一点。
现在,对于第二句话,我得到了这些: 框架:期限:插管,[信息:气管] 框架:术语:管理,[信息::信息:Midazolan,日期:,信息:8mg,信息:Fentanest,信息:100,信息:Nimbex,信息:10mg] 框架:期限:FENTANEST,[信息:100] 帧:期限:Nimbex,[信息:10毫克]
从这些依赖项:
BasicDependencies=-> Ventilated/VBN (root)
-> vacuum/NN (nmod)
-> in/IN (case)
-> and/CC (cc)
-> proceeded/VBD (conj)
-> then/RB (advmod)
-> intubation/NN (nmod)
-> to/TO (case)
-> endotracheal/JJ (amod)
-> administration/NN (nmod)
-> after/IN (case)
-> 8mg/NN (nmod)
-> of/IN (case)
-> :/: (punct)
-> Midazolan/JJ (amod)
-> ,/, (punct)
-> Fentanest/NN (conj)
-> 100/CD (nummod)
-> ,/, (punct)
-> Nimbex/NNP (conj)
-> 10mg/CD (nummod)
-> ./. (punct)
当我想得到的是: 框架:术语:插管,[信息:气管插管,信息:咪达唑仑8mg,信息:Fentantest 100,Nimbex 10mg] 因此,我想将药物的名称与剂量结合起来,并将它们视为适当词语的重要术语! 另外,我组合复合词的方式很好,但有时我需要将复合词与nummod结合起来,我不知道如何重新处理单词,再将它作为一个单词添加到集合中,以添加有关它的更多信息。 例如: 框架:术语:压力,[信息:血压,信息:心率,信息:77/40,日期:,信息:113,信息:和,信息:不可检测,信息:SPO2,信息:监控,] bool压力是两个令牌bool和压力的组合,在我合并它们以在我的框架中添加单个信息后,我需要重新处理它以添加与其相关的77/40 nummod信息,但不知道怎么样!
现在,我确实意识到这根本不是微不足道的。我向您展示了我的示例,以便您能够理解我的问题,我不会要求您完全解决我的问题,但即使是一些参考或示例或信息也可能非常有用!