在R中分割文件并自动创建记事本文件

时间:2014-07-11 19:29:37

标签: r

我有一个类似这样的文件:

"1943" 359 1327 "t000000" 8
"1944" 359 907 "t000000" 8
"1946" 359 472 "t000000" 8
"1947" 359 676 "t000000" 8
"1948" 326 359 "t000000" 8
"1949" 359 585 "t000000" 8
"1950" 359 1157 "t000000" 8
"2460" 275 359 "t000000" 8
"2727" 22 556 "t000000" 8
"2730" 22 676 "t000000" 8
"479" 17 1898 "t0000000" 5
"864" 347 720 "t000s" 12
"3646" 349 691 "t000s" 7
"6377" 870 1475 "t000s" 14
"7690" 566 870 "t000s" 14
"7691" 870 2305 "t000s" 14
"8120" 870 1179 "t000s" 14
"8122" 44 870 "t000s" 14
"8124" 870 1578 "t000s" 14
"8125" 206 870 "t000s" 14  
"8126" 870 1834 "t000s" 14
"6455" 1 1019 "t000t" 13
"4894" 126 691 "t00t" 9
"4896" 126 170 "t00t" 9
"560" 17 412 "t0t" 7
"130" 65 522 "tq" 18
"1034" 17 990 "tq" 10
"332" 3 138 "ts" 2
"2063" 61 383 "ts" 5
"2089" 127 147 "ts" 11
"2431" 148 472 "ts" 15
"2706" 28 43 "ts" 21
.....................

第一列是随机行号(在我需要的某些排序之后得到),第四列包含我实际想要不同记事本文件的模式。

我想要的是我获得了单独的记事本文件,例如,f1.txt,f2.txt,f3.txt ...包含第4列中值的所有行。例如,我得到一个不同的文件for" t000000"然后是另一个用于" t000s"然后分开一个" t00t"等等...

我这样做了,

list2env(split(sort, sort[,4]),envir=.GlobalEnv)

这里sort是我的数据集的文本文件名,3是该列。 然后我可以使用write.table命令,但由于我的文件非常庞大,我可以获得大约100个这样的文件并且手动执行write.table非常困难。有什么方法可以让它自动化吗?

2 个答案:

答案 0 :(得分:1)

使用优秀的data.table包:

library(data.table)

# get your source file
the_file <- fread('~/Desktop/file.txt') #replace with your file path

# vector of unique values of column 4 & the roots of your output filename
fl_names <- unique(the_file$V4)

# dump all the relevant subsets to files
for (f in fl_names) write.table(the_file[V4==f, ], paste0(f, '.txt'), row.names=FALSE)

答案 1 :(得分:1)

您已经找到了split,但是list2env代替了lapply,这将为您提供更多功能,只需使用# Generally confusing to name a data.frame # the same as a common function! X <- split(sort, sort[, 4]) invisible(lapply(names(X), function(y) write.csv(X[[y]], file = paste0(y, ".csv"))))

Dir <- getwd()                # Won't be necessary in your actual script
setwd(tempdir())              # I just don't want my working directory filled
list.files(pattern=".csv")    # with random csv files, so I'm using tempdir()
# character(0)                # Note that there are no csv files presently
X <- split(sort, sort[, 4])   # You've already figured this step out
## invisible is just so you don't have to see an empty list
## printed in your console. The rest is pretty straightforward
invisible(lapply(names(X), function(y)
 write.csv(X[[y]], file = paste0(y, ".csv"))))
list.files(pattern=".csv")    # Check that the files are there
# [1] "t000000.csv"  "t0000000.csv" "t000s.csv"    "t000t.csv"   
# [5] "t00t.csv"     "t0t.csv"      "tq.csv"       "ts.csv" 
setwd(Dir)                    # Won't be necessary for your actual script

概念证明:

{{1}}