我有一份n个文件的列表,我试图通过加法进行迭代,因此它们形成一个单独的VCorpus。通常,您可以使用c()
运算符并加入两个VCorpus来创建更大的运算符。但是,如果我首先按照下面的方式初始化列表,它会将语料库转换为字符列表。但如果我不首先将其初始化,我会收到错误。
clean_corpus <- c()
for (i in directory_source$filelist) {
conn <- file(i,"r")
filebuffer <- readLines(conn, encoding="UTF-8", skipNul=TRUE)
close(conn)
set.seed(3413)
sampled_buffer <- sample(filebuffer, size = round(length(filebuffer) * fraction, digits = 0))
sample_corpus <- VCorpus(VectorSource(sampled_buffer))
clean_corpus <- c(clean_corpus, sample_corpus, recursive = TRUE)
}
答案 0 :(得分:0)
它未经测试但可能有效:
f <- function(i){conn <- file(i,"r")
filebuffer <- readLines(conn, encoding="UTF-8", skipNul=TRUE)
close(conn)
set.seed(3413)
sampled_buffer <- sample(filebuffer, size = round(length(filebuffer) * fraction, digits = 0))
sample_corpus <- VCorpus(VectorSource(sampled_buffer))
clean_corpus <- c(clean_corpus, sample_corpus, recursive = TRUE)}
corpus_list <- lapply(directory_source$filelist,f)
corpus_agg <- do.call(c,corpus_list)