我试图在R中导入非结构化文本文件,在导入txt文件后,我想在某些段落中将该文件拆分出来。我不想将文件分成几行。认为我的txt文件就像一篇文章。我的代码是
setwd('F:/My files/specification/files/text')# set the path to desired folder
rm(dataset)
file_list <- list.files('F:/My files/specification/files/text')
for (file in file_list){
# if the merged dataset doesn't exist, create it
if (!exists("dataset")){
dataset <- scan(file, sep = '\n',
what = list(case = character(), value = character()),
strip.white = TRUE, blank.lines.skip = TRUE)
}
# if the merged dataset does exist, append to it
if (exists("dataset")){
temp_dataset <-scan(file, sep = '\n',
what = list(case = character(), value = character()),
strip.white = TRUE, blank.lines.skip = TRUE)
names(dataset) <- names(temp_dataset)
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
}
}
write.table(dataset,"10File.txt",sep="\t")
x <- readLines("10File.txt") # read data with readLines
#x <- scan("F:/My files/specification/files/text/10File.txt", what="character", sep="\n")
a <- strsplit(x, "\\n\\n")
p_corpus <- Corpus(VectorSource(a))
当我只有一个文件时,代码是错误的,因为要写入数据的文件是文件的两倍。请帮助我导入此文件,然后将文件拆分为段落。