Quanteda:具有预定义功能集的文档功能矩阵

时间:2017-10-05 21:33:53

标签: r text-mining quanteda

我使用quanteda构建两个文档特征矩阵:

library(quanteda)
DFM1 <- dfm("this is a rock")
#        features
# docs    this is a rock
#   text1    1  1 1    1
DFM2 <- dfm("this is music")
#        features
# docs    this is music
#   text1    1  1     1

但是,我希望DFM2具有一组特定功能,即来自DFM1的功能:

DFM2 <- dfm("this is music", *magicargument* = featnames(DFM1))
#        features
# docs    this is a rock
#   text1    1  1 0    0

我缺少一个神奇的论据吗?或者是否有另一种有效的方法可以为大量的单词取代它?

1 个答案:

答案 0 :(得分:2)

魔术参数是pattern,您可以在其中提供其功能将匹配的dfm(包括不在目标dfm中的功能的零):

dfm_select(DFM2, pattern = DFM1)
# Document-feature matrix of: 1 document, 4 features (50% sparse).
# 1 x 4 sparse Matrix of class "dfmSparse"
#        features
# docs    this is a rock
#   text1    1  1 0    0