R:稀疏矩阵(simple_triplet_matrix - TextDocumentMatrix)大到要处理的应用

时间:2015-02-19 13:01:53

标签: r bigdata text-mining tm

我有TextDocumentMatrix tdm(条款:66779,文档:609551,非稀疏条目:9704315)。我正在尝试使用下面列出的代码处理它:

# 1. counting sum of term values for each document
colTotals = apply(tdm , 2, sum)

# 2. Singular Value Decomposition 
s = svd(as.matrix(tdm), nu = nrow(tdm), nv = ncol(tdm))

# 3. Latent Semantic Analysis (with lsa package)
sp = lsa(tdm)

上面列出的每个调用(1,2,3)都会导致错误:

Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA
In addition: Warning message:
In nr * nc : NAs produced by integer overflow

如何处理如此大的矩阵?

0 个答案:

没有答案