Quanteda kwic将数据附加到输出

时间:2016-09-12 19:55:24

标签: r

我想在kwic输出中附加一些元数据,例如客户ID(见下文),这样就可以轻松查找主文件。我尝试使用cbind附加数据,但没有正确匹配。

如果可能的话,我们将非常感谢你们。

     docname    position    contextPre      keyword    contextPost          CustID
     text3790     5    nothing at all looks  good   and sounds great           1
     text3801    11    think the offer is a  good   value and has a lot        3
     text3874    10    not so sure thats a   good   word to use                5

发起data.frame

       CustID   Comment
         1      nothing at all looks good and sounds great
         2      did not see anything that was very appealing
         3      I think the offer is a good value and has a lot of potential
         4      these items look terrible how are you still in business
         5      not so sure thats a good word to use
         6      having a hard time believing some place would sell an item so low
         7      it may be worth investing in some additional equipment

2 个答案:

答案 0 :(得分:5)

起初我认为理想的解决方案是使用setview,但kwic似乎没有显示它们的选项。我仍然需要将id-doc映射表与kwic结果合并。

docvars

结果:

library(data.table)
library(quanteda)

s <- "CustID,   Comment
1,      nothing at all looks good and sounds great
2,      did not see anything that was very appealing
3,      I think the offer is a good value and has a lot of potential
4,      these items look terrible how are you still in business
5,      not so sure thats a good word to use
6,      having a hard time believing some place would sell an item so low
7,      it may be worth investing in some additional equipment"

# I'm using data.table mainly to read the data easily. 
dt <- fread(s, data.table=FALSE)

# all operations below apply to data frame
myCorpus <- corpus(df$Comment)
# the Corpus and CustID came from same data frame, 
# thus ensured the mapping is correct
docvars(myCorpus, "CustID") <- df$CustID
summary(myCorpus)
# build the mapping table of docname and CustID. 
# The docname is in row.names, have to make an explicit column
dv_table <- docvars(myCorpus)
id_table <- data.frame(docname = row.names(dv_table), CustID = dv_table$CustID)
result <- kwic(myCorpus, "good", window = 3, valuetype = "glob")
id_result <- merge(result, id_table, by = "docname")

答案 1 :(得分:1)

这是一个data.frame对象,因此您可以按常规方式添加列:

library(quanteda)
h <- head(kwic(inaugTexts, "secure*", window = 3, valuetype = "glob"))

#Add new ID column
h$CustID <- 1:nrow(h)