使用POS标签确定句子的时间性

时间:2019-02-18 13:22:14

标签: r text-mining tidytext

如果要从一系列句子中进行操作,我想找出是否已执行操作。 例如: "I will prescribe this medication""I prescribed this medication""He had already taken the stuff""he may take the stuff later"

我正在尝试一种tidytext方法,并决定只查找过去分词动词和将来分词动词。但是,当我使用唯一类型的动词POS标签时,得到的是"Verb intransitive""Verb (usu participle)""Verb (transitive)"。如何了解过去或将来的动词,或者我可以使用其他POS标记器?

我热衷于使用tidytext,因为我无法安装其他一些文本挖掘程序包使用的rjava

1 个答案:

答案 0 :(得分:1)

查看udpipe批注中的形态特征。这些内容放在注释的专区列中。您可以使用cbind_morphological将它们作为额外的列放入数据集中。 所有功能均在https://universaldependencies.org/u/feat/index.html中定义 您会在下面看到“我已开这种药”一句中的过去式以及“他已经服用”中的“已服用”一词。

library(udpipe)
x <- data.frame(doc_id = 1:4, 
                text = c("I will prescribe this medication", 
                         "I prescribed this medication", 
                         "He had already taken the stuff", 
                         "he may take the stuff later"), 
                stringsAsFactors = FALSE)
anno <- udpipe(x, "english")
anno <- cbind_morphological(anno)

anno[, c("doc_id", "token", "lemma", "feats", "morph_verbform", "morph_tense")]

 doc_id      token      lemma                                                  feats morph_verbform morph_tense
      1          I          I             Case=Nom|Number=Sing|Person=1|PronType=Prs           <NA>        <NA>
      1       will       will                                           VerbForm=Fin            Fin        <NA>
      1  prescribe  prescribe                                           VerbForm=Inf            Inf        <NA>
      1       this       this                               Number=Sing|PronType=Dem           <NA>        <NA>
      1 medication medication                                            Number=Sing           <NA>        <NA>
      2          I          I             Case=Nom|Number=Sing|Person=1|PronType=Prs           <NA>        <NA>
      2 prescribed  prescribe                       Mood=Ind|Tense=Past|VerbForm=Fin            Fin        Past
      2       this       this                               Number=Sing|PronType=Dem           <NA>        <NA>
      2 medication medication                                            Number=Sing           <NA>        <NA>
      3         He         he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs           <NA>        <NA>
      3        had       have                       Mood=Ind|Tense=Past|VerbForm=Fin            Fin        Past
      3    already    already                                                   <NA>           <NA>        <NA>
      3      taken       take                               Tense=Past|VerbForm=Part           Part        Past
      3        the        the                              Definite=Def|PronType=Art           <NA>        <NA>
      3      stuff      stuff                                            Number=Sing           <NA>        <NA>
      4         he         he Case=Nom|Gender=Masc|Number=Sing|Person=3|PronType=Prs           <NA>        <NA>
      4        may        may                                           VerbForm=Fin            Fin        <NA>
      4       take       take                                           VerbForm=Inf            Inf        <NA>
      4        the        the                              Definite=Def|PronType=Art           <NA>        <NA>
      4      stuff      stuff                                            Number=Sing           <NA>        <NA>
      4      later      later                                                   <NA>           <NA>        <NA>