CLUTO Document Term Matrix to tm DocumentTermMatrix

时间:2013-04-02 14:55:12

标签: r text-mining tm cluto

我有一个cluto格式的文档术语矩阵:

#Document #Term #TotalItem
term-x weight-x term-y weight-y (for only nonzeros terms, a row per document)

我想从这个文件创建DocumentTermMatrix(tm包)而不是语料库,这可能吗?

Cluto File:
2 3 3
1 3 3 4
2 8

Row File:
car
plane

Column File:
x
y
z

解决方案:

dtm = as.DocumentTermMatrix(read_stm_CLUTO(file), weightTf);
rows <- scan("rows.txt", what="", sep="\n");
columns <- scan("columns.txt", what="", sep="\n");

dtm$dimnames = list(rows,columns);

1 个答案:

答案 0 :(得分:1)

这应该这样做:

require(slam)
as.DocumentTermMatrix(read_stm_CLUTO(file), weightTf)

如果您可以链接到您的CLUTO文件或在Q中添加它的摘录,我们可以查看行名和列名。

帽子提示:https://r-forge.r-project.org/scm/viewvc.php/pkg/R/foreign.R?root=tm&view=diff&r1=1127&r2=1127&diff_format=s