Question

我有一个csv文件，其中我的所有文档都以术语文档矩阵形式和一个分类变量作为情绪。我想使用tm的功能（术语频率等）。考虑到我开始使用的数据，有没有办法这样做？

# given:

dtm = read.csv(file_path, na.strings="") 
dtm$rating = as.factor(dtm$rating)

str(dtm)
# 'data.frame': 2000 obs. of  2002 variables:
# $ ID           : int  1 2 3 4 5 6 7 8 9 10 ...
# $ abl          : int  0 0 0 0 0 0 0 0 0 0 ...
# ...

head(dtm)
#ID abl absolut absorb accept 
#1  1   0       0      0      
#2  2   0       0      1

# I'd like to achieve...
tdm <- TermDocumentMatrix(dtm,
                          control = list(removePunctuation = TRUE,
                                         stopwords = TRUE))

Answer 1

您可以使用as.TermDocumentMatrix(df, weighting = weightTf)（在R套餐tm中）做您想要的事情吗？

将术语文档矩阵转换为tm库支持的术语文档矩阵

1 个答案: