r - 将一个csv文件拆分为多个txt文件

时间:2017-09-18 20:05:26

标签: r split tab-delimited-text

我需要将一个大的.csv文件(大约9列和9,000多行)拆分为每行的单独.txt文件,并在第一列中按名称命名每个新生成的文件。

e.g。 对于.csv文件:

01001_r1    32.4327 -86.6190    0.65    0.20    0.15    1.33    5.47    8
01001_r2    32.4327 -86.6190    0.65    0.20    0.15    1.33    5.46    8
01001_r3    32.4327 -86.6190    0.80    0.15    0.05    1.33    5.23    10
01003_r1    30.4887 -87.6918    0.65    0.20    0.15    1.33    5.23    9
01003_r2    30.4887 -87.6918    0.80    0.15    0.05    1.33    5.25    9
01003_r3    30.4887 -87.6918    0.65    0.20    0.15    1.33    4.96    8

我最终会得到6个文件,每个文件各占一行。

输出文件中的列需要“制表符分隔”,并且文件不能包含行名或列名。

例如输出文件应如下所示:

01001_r1    32.4327 -86.6190    0.65    0.20    0.15    1.33    5.47    8

这是我到目前为止的地方:

#set 'working directory'
setwd('C:/Users/Data/soils_data/sitesoil_in')

#identify data frame from .csv file
sd <- read.csv('site_soil.csv', sep="\t", header=F, fill=F)

lapply(1:nrow(sd), function(i) write.csv(sd[i,],
                                         file = paste0(sd[i,1], ".txt"),
                                         row.names = F, header = F,
                                         quote = F))

这就是我为每个输出文件得到的结果:

文件名:01001_r1

V1,V2,V3,V4,V5,V6,V7,V8,V9
01001_r1,32.4327,-86.619,0.65,0.2,0.15,1.33,5.47,8

我无法让它消除列名或用制表符分隔列。 我已尝试使用header = F或col.names = F来消除标题,并使用sep =“\ t”来分隔列,但它无法识别命令。

我将不胜感激任何帮助。 谢谢, 即

根据所有建议,这是一个更简单的代码,可以解决这个问题:

#set 'working directory'
setwd('C:/Users/Elena/Desktop/DayCent_muvp_MODEL/DayCent_SourceData/soils_data/sitesoil_in')

#identify data frame from .csv file
sd <- read.csv('site_soil.csv', sep="\t", header=F, fill=F)

lapply(1:nrow(sd), 
       function(i) write.table(sd[i,],
                               file = paste0(sd[i,1], ".txt",collapse = ""),
                               row.names = FALSE, col.names = FALSE,
                               sep = "\t"
       ))

谢谢大家的帮助。 即

3 个答案:

答案 0 :(得分:1)

试试这个

dat <-"01001_r1,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.47,8
01001_r2,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.46,8
01001_r3,32.4327,-86.6190,0.80,0.15,0.05,1.33,5.23,10
01003_r1,30.4887,-87.6918,0.65,0.20,0.15,1.33,5.23,9
01003_r2,30.4887,-87.6918,0.80,0.15,0.05,1.33,5.25,9
01003_r3,30.4887,-87.6918,0.65,0.20,0.15,1.33,4.96,8
"


df <- read.delim(file = textConnection(dat), sep = ',', header = FALSE)

df
#         V1      V2       V3   V4   V5   V6   V7   V8 V9
# 1 01001_r1 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.47  8
# 2 01001_r2 32.4327 -86.6190 0.65 0.20 0.15 1.33 5.46  8
# 3 01001_r3 32.4327 -86.6190 0.80 0.15 0.05 1.33 5.23 10
# 4 01003_r1 30.4887 -87.6918 0.65 0.20 0.15 1.33 5.23  9
# 5 01003_r2 30.4887 -87.6918 0.80 0.15 0.05 1.33 5.25  9
# 6 01003_r3 30.4887 -87.6918 0.65 0.20 0.15 1.33 4.96  8

output_file_base <- "soil_"
output_file_ext <- ".tsv"

for(i in seq(nrow(df))){
    output_file <- paste0(output_file_base, as.character(i), output_file_ext)
    dfi <- df[i, ]
    write.table(x = dfi, file = output_file, sep = '\t', quote = FALSE, col.names = FALSE, row.names = FALSE)
}

输出:

$ cat soil_6.tsv
01003_r3    30.4887 -87.6918    0.65    0.2 0.15    1.33    4.96    8

答案 1 :(得分:1)

我调整了你的代码:

Collection {#1 ▼
  #items: array:2 [▼
    0 => Partido {#271 ▶} #This is Partido with id 1
    1 => Partido {#268 ▶}
  ]
}
Collection {#2 ▼
  #items: array:3 [▼
    0 => Partido {#279 ▶}
    1 => Partido {#280 ▶}
    2 => Partido {#283 ▶} #This is Partido with id 1
  ]
}

答案 2 :(得分:1)

这可能适用于您要完成的任务。

df  <-read.csv(text = "01001_r1,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.47,8
01001_r2,32.4327,-86.6190,0.65,0.20,0.15,1.33,5.46,8
01001_r3,32.4327,-86.6190,0.80,0.15,0.05,1.33,5.23,10
01003_r1,30.4887,-87.6918,0.65,0.20,0.15,1.33,5.23,9
01003_r2,30.4887,-87.6918,0.80,0.15,0.05,1.33,5.25,9
01003_r3,30.4887,-87.6918,0.65,0.20,0.15,1.33,4.96,8",
stringsAsFactors = FALSE,
header = FALSE)


apply(df, 1, function(x){write.table(t(x), 
                                     file = paste0(x[1],".txt"), 
                                     sep = "\t", 
                                     quote = FALSE, 
                                     col.names = FALSE, 
                                     row.names = FALSE)})