如果要从一系列句子中进行操作,我想找出是否已执行操作。
例如:
"I will prescribe this medication"
与"I prescribed this medication"
或"He had already taken the stuff"
与"he may take the stuff later"
我正在尝试一种tidytext
方法,并决定只查找过去分词动词和将来分词动词。但是,当我使用唯一类型的动词POS标签时,得到的是"Verb intransitive"
,"Verb (usu participle)"
和"Verb (transitive)"
。如何了解过去或将来的动词,或者我可以使用其他POS标记器?
我热衷于使用tidytext
,因为我无法安装其他一些文本挖掘程序包使用的rjava
。
答案 0 :(得分:1)
查看udpipe
批注中的形态特征。这些内容放在注释的专区列中。您可以使用cbind_morphological
将它们作为额外的列放入数据集中。
所有功能均在https://universaldependencies.org/u/feat/index.html中定义
您会在下面看到“我已开这种药”一句中的过去式以及“他已经服用”中的“已服用”一词。
library(udpipe)
x <- data.frame(doc_id = 1:4,
text = c("I will prescribe this medication",
"I prescribed this medication",
"He had already taken the stuff",
"he may take the stuff later"),
stringsAsFactors = FALSE)
anno <- udpipe(x, "english")
anno <- cbind_morphological(anno)
anno[, c("doc_id", "token", "lemma", "feats", "morph_verbform", "morph_tense")]
doc_id token lemma feats morph_verbform morph_tense
1 I I Case=Nom|Number=Sing|Person=1|PronType=Prs <NA> <NA>
1 will will VerbForm=Fin Fin <NA>
1 prescribe prescribe VerbForm=Inf Inf <NA>
1 this this Number=Sing|PronType=Dem <NA> <NA>
1 medication medication Number=Sing <NA> <NA>
2 I I Case=Nom|Number=Sing|Person=1|PronType=Prs <NA> <NA>
2 prescribed prescribe Mood=Ind|Tense=Past|VerbForm=Fin Fin Past
2 this this Number=Sing|PronType=Dem <NA> <NA>
2 medication medication Number=Sing <NA> <NA>
3 He he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs <NA> <NA>
3 had have Mood=Ind|Tense=Past|VerbForm=Fin Fin Past
3 already already <NA> <NA> <NA>
3 taken take Tense=Past|VerbForm=Part Part Past
3 the the Definite=Def|PronType=Art <NA> <NA>
3 stuff stuff Number=Sing <NA> <NA>
4 he he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs <NA> <NA>
4 may may VerbForm=Fin Fin <NA>
4 take take VerbForm=Inf Inf <NA>
4 the the Definite=Def|PronType=Art <NA> <NA>
4 stuff stuff Number=Sing <NA> <NA>
4 later later <NA> <NA> <NA>