如何获得分组格式的输出并在csv文件中导出相同格式?

时间:2019-07-07 00:24:24

标签: r data.table

我需要以逐组格式输出,每次开始都以组名称开头。

dt <- data.table(
                 Type    = c("t","t", "c", "c", "c"),
                 Time    = c("pre", "post", "pre", "post", "pre"),
                 Student = c(6,6,6,7,7),
                 RollNum1 = c(49,69,44,86,39),
                 Marks1= c(8,9,10,8,5))

我想按TypeTime变量对上表进行分组,并以csv格式导出时以以下格式获取输出

Output in single sheet and row-wise

我尝试使用split

dt_split <- split(dt, by = c("Type", "Time"))

但是输出没有自定义的grouptime名称,并且当我导出到csv时,输出的格式不正确。

1 个答案:

答案 0 :(得分:1)

您要求的似乎不是适当的CSV(每行具有相同数量的字段),但是我认为您的目标是随后将文件导入Excel。

通过使用append这样的参数,您应该能够摆脱困境:

f = tempfile()
dt[ , {
if (.GRP == 1L) {
  fwrite(.SD[0L], f)
}
cat(paste(sprintf('%s=%s', names(.BY), unlist(.BY)), collapse = '; '),
    '\n', sep = '', file = f, append = TRUE)
fwrite(.SD, f, append = TRUE)
}, by = .(Group = Type, Time)]
cat(readLines(f), sep = '\n')
# Student,RollNum1,Marks1
# Group=t; Time=pre
# 6,49,8
# Group=t; Time=post
# 6,69,9
# Group=c; Time=pre
# 6,44,10
# 7,39,5
# Group=c; Time=post
# 7,86,8

.GRP分支确保我们只为第一个by组写列名。除此之外,对于每个组,我们首先写“组信息”,然后将其余数据写为普通CSV。

我不太肯定导入Excel时的外观;您可能需要在标题行中添加一些空白列。

FWIW,除非您有特定的用例,否则我建议您不要这样做。写出分组数据的更典型方法是使用输出(分区)的目录结构或文件名来表示分组的含义,例如:

out_dir = tempdir()
dt[ , {
  # highly generic -- it will be cleaner and easier to read in your case
  #   to simply write out the directory names using `.BY` for the two groupers
  partition_names = sprintf('%s=%s', names(.BY), unlist(.BY))
  partitions = do.call(file.path, as.list(partition_names))
  out_subdir = file.path(out_dir, partitions)
  dir.create(out_subdir, recursive = TRUE, showWarning = FALSE)
  # timestamp the file write time as a means of logging
  ts = as.integer(Sys.time())
  fwrite(.SD, file.path(out_subdir, sprintf('%d.csv', ts)))
}, by = .(Group = Type, Time)]
list.files(out_dir, recursive = TRUE)
# [1] "Group=c/Time=post/1562500495.csv" "Group=c/Time=pre/1562500495.csv" 
# [3] "Group=t/Time=post/1562500495.csv" "Group=t/Time=pre/1562500495.csv"