我的文本文件有三列,分别是文档ID,术语ID和术语频率。是否有R函数将此数据转换为文档术语矩阵?
答案 0 :(得分:2)
例如
df <- read.table(header=T, text='"doc" "term" "freq"
1 "foo" 1
1 "bar" 2
2 "hello" 1
2 "world" 2')
library(tm)
dtm <- as.DocumentTermMatrix(xtabs(freq~doc+term, df), weighting=weightTf)
as.matrix(dtm)
# Terms
# Docs bar foo hello world
# 1 2 1 0 0
# 2 0 0 1 2