在某些段落中拆分出非结构化txt文件

时间:2019-04-08 11:05:51

标签: r

我试图在R中导入非结构化文本文件,在导入txt文件后,我想在某些段落中将该文件拆分出来。我不想将文件分成几行。认为我的txt文件就像一篇文章。我的代码是

setwd('F:/My files/specification/files/text')# set the path to desired folder
rm(dataset)
file_list <- list.files('F:/My files/specification/files/text')
for (file in file_list){

  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- scan(file, sep = '\n', 
                    what = list(case = character(), value = character()), 
                    strip.white = TRUE, blank.lines.skip = TRUE)
  }

  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-scan(file, sep = '\n', 
                        what = list(case = character(), value = character()), 
                        strip.white = TRUE, blank.lines.skip = TRUE)
    names(dataset) <- names(temp_dataset) 
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}
write.table(dataset,"10File.txt",sep="\t")
x <- readLines("10File.txt")  # read data with readLines
#x <- scan("F:/My files/specification/files/text/10File.txt", what="character", sep="\n")
a <- strsplit(x, "\\n\\n")


p_corpus <- Corpus(VectorSource(a))

当我只有一个文件时,代码是错误的,因为要写入数据的文件是文件的两倍。请帮助我导入此文件,然后将文件拆分为段落。

0 个答案:

没有答案