我需要以逐组格式输出,每次开始都以组名称开头。
dt <- data.table(
Type = c("t","t", "c", "c", "c"),
Time = c("pre", "post", "pre", "post", "pre"),
Student = c(6,6,6,7,7),
RollNum1 = c(49,69,44,86,39),
Marks1= c(8,9,10,8,5))
我想按Type
和Time
变量对上表进行分组,并以csv格式导出时以以下格式获取输出
我尝试使用split
dt_split <- split(dt, by = c("Type", "Time"))
但是输出没有自定义的group
和time
名称,并且当我导出到csv时,输出的格式不正确。
答案 0 :(得分:1)
您要求的似乎不是适当的CSV(每行具有相同数量的字段),但是我认为您的目标是随后将文件导入Excel。
通过使用append
这样的参数,您应该能够摆脱困境:
f = tempfile()
dt[ , {
if (.GRP == 1L) {
fwrite(.SD[0L], f)
}
cat(paste(sprintf('%s=%s', names(.BY), unlist(.BY)), collapse = '; '),
'\n', sep = '', file = f, append = TRUE)
fwrite(.SD, f, append = TRUE)
}, by = .(Group = Type, Time)]
cat(readLines(f), sep = '\n')
# Student,RollNum1,Marks1
# Group=t; Time=pre
# 6,49,8
# Group=t; Time=post
# 6,69,9
# Group=c; Time=pre
# 6,44,10
# 7,39,5
# Group=c; Time=post
# 7,86,8
.GRP
分支确保我们只为第一个by
组写列名。除此之外,对于每个组,我们首先写“组信息”,然后将其余数据写为普通CSV。
我不太肯定导入Excel时的外观;您可能需要在标题行中添加一些空白列。
FWIW,除非您有特定的用例,否则我建议您不要这样做。写出分组数据的更典型方法是使用输出(分区)的目录结构或文件名来表示分组的含义,例如:
out_dir = tempdir()
dt[ , {
# highly generic -- it will be cleaner and easier to read in your case
# to simply write out the directory names using `.BY` for the two groupers
partition_names = sprintf('%s=%s', names(.BY), unlist(.BY))
partitions = do.call(file.path, as.list(partition_names))
out_subdir = file.path(out_dir, partitions)
dir.create(out_subdir, recursive = TRUE, showWarning = FALSE)
# timestamp the file write time as a means of logging
ts = as.integer(Sys.time())
fwrite(.SD, file.path(out_subdir, sprintf('%d.csv', ts)))
}, by = .(Group = Type, Time)]
list.files(out_dir, recursive = TRUE)
# [1] "Group=c/Time=post/1562500495.csv" "Group=c/Time=pre/1562500495.csv"
# [3] "Group=t/Time=post/1562500495.csv" "Group=t/Time=pre/1562500495.csv"