我正在尝试学习在R中使用lsa包。我正在处理比下面的示例更大的数据集,但这是为了重现性(发布此代码的道具this person在他的网站上,它是一个很好的资源)。
我收到一条奇怪的错误消息,我似乎无法解决:
Error in Ops.simple_triplet_matrix(m, 1) : Incompatible dimensions.
下面是我正在修改的一些代码:
# load required libraries
library(tm)
library(ggplot2)
library(lsa)
library(SnowballC)
lsa <- function () {
# 1. Prepare mock data
text <- c("transporting food by cars will cause global warming. so we should go local.",
"we should try to convince our parents to stop using cars because it will cause global warming.",
"some food, such as mongo, requires a warm weather to grow. so they have to be transported to canada.",
"a typical Electronic Circuit can be built with a battery, a bulb, and a switch.",
"electricity flows from batteries to the bulb, just like water flows through a tube.",
"batteries have chemical energe in it. then electrons flow through a bulb to light it up.",
"birds can fly because they have feather and they are light.", "why some birds like pigeon can fly while some others like chicken cannot?",
"feather is important for birds' fly. if feather on a bird's wings is removed, this bird cannot fly.")
view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3))
df <- data.frame(text, view, stringsAsFactors = FALSE)
# prepare corpus
corpus <- Corpus(VectorSource(df$text))
# corpus <- tm_map(corpus, tolower)
# corpus <- tm_map(corpus, removePunctuation)
# corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
# corpus <- tm_map(corpus, stemDocument, language = "english")
corpus <- tm_map(corpus, PlainTextDocument)
# 2. MDS with raw term-document matrix compute distance matrix
td.mat <- TermDocumentMatrix(corpus)
td.mat.lsa <- lw_logtf(td.mat) * gw_idf(td.mat) # weighting
lsaSpace <- lsa(td.mat.lsa) # create LSA space
dist.mat.lsa <- dist(t(as.textmatrix(lsaSpace))) # compute distance matrix
return(dist.mat.lsa) # check distance matrix
}
我可以毫无问题地生成语料库,我可以将其转换为术语文档矩阵。我定义dt.mat.lsa时会触发错误。
回溯如下:
4 stop("Incompatible dimensions.")
3 Ops.simple_triplet_matrix(m, 1)
2 lw_logtf(td.mat) at lsa.R#31
1 lsa()
因此,我的主要问题是:
预先感谢您提供的任何帮助;这是我的第一篇文章,所以也欢迎对我的问题质量的反馈!
答案 0 :(得分:0)
已经弄清楚了!
我把我的代码包裹在&#39; lsa&#39;功能调用并正在使用&#39; lsa&#39;作为函数体中的变量名。因此它具有不兼容的尺寸,因为lsa是在这种环境中不同定义的函数。
唷!