我正在学习操作data.table变量的语法。虽然我可以做简单的事情,但我的理解对于更复杂的任务来说还不够彻底。例如,我想将以下数据转换为每行具有一个不同的“类型”值,基于“子类型”的值生成单独的列,并且当存在具有相同“类型/子类型的多个行时折叠唯一值“组合。
给出输入数据:
data = data.frame(
var1 = c("a","b","c","b","d","e","f"),
var2 = c("aa","bb","cc","dd","ee","ee","ff"),
subtype = c("1","2","2","2","1","1","2"),
type = c("A","A","A","A","B","B","B")
)
var1 var2 subtype type
1 a aa 1 A
2 b bb 2 A
3 c cc 2 A
4 b dd 2 A
5 d ee 1 B
6 e ee 1 B
7 f ff 2 B
我想得出:
1.var1 1.var2 2.var1 2.var2 2.type
A "a" "aa" "b|c" "bb|cc|dd" "A"
B "d|e" "ee" "f" "ff" "B"
使用数据框,我可以使用以下代码实现此目的:
data.derived = do.call(
rbind,
lapply(
split(data,list(data$type)),
function(x) {
do.call (
c,
lapply(
split(x, list(x$subtype)),
function(y) {
result = c(
var1 = paste(unique(y$var1),collapse ="|"),
var2 = paste(unique(y$var2),collapse ="|")
)
if (as.character(y$subtype[1]) == "2") {
result = c(result, type = as.character(y$type[1]))
}
result}))}))
如何使用数据表执行相同的操作?
答案 0 :(得分:5)
从结果中可以清楚地看到,您正在将数据从长格式转换为宽格式,并且子类型沿行方向分布,因此您需要dcast
中的data.table
。由于您希望将var1
和var2
中的值汇总为单个字符串,因此您需要将聚合函数自定义为paste
以折叠结果:
library(data.table)
setDT(data)
dcast(data, type ~ subtype, value.var = c("var1", "var2"),
fun = function(v) paste0(unique(v), collapse = "|"))
# type var1_function_1 var1_function_2 var2_function_1 var2_function_2
# 1: A a b|c aa bb|cc|dd
# 2: B d|e f ee ff
答案 1 :(得分:1)
不确定您是否要使用data.table包和命令,或者您是否想知道您的代码是否也适用于数据表。
我认为复杂的计算需要使用适当的包。上面的脚本适合你,但是如果它不是由你写的,很难看出它的作用。
在开始使用data.table之前,请检查一些不错的软件包,让您的生活更轻松。像
library(dplyr)
library(tidyr)
data = data.frame(
var1 = c("a","b","c","b","d","e","f"),
var2 = c("aa","bb","cc","dd","ee","ee","ff"),
subtype = c("1","2","2","2","1","1","2"),
type = c("A","A","A","A","B","B","B")
)
data %>%
group_by(type, subtype) %>%
summarise(x1 = paste(unique(var1),collapse ="|"),
x2 = paste(unique(var2),collapse ="|")) %>%
unite(xx,x1,x2) %>%
spread(subtype,xx) %>%
separate(`1`, c("1.var1","1.var2"), sep="_") %>%
separate(`2`, c("2.var1","2.var2"), sep="_") %>%
ungroup
# # A tibble: 2 x 5
# type 1.var1 1.var2 2.var1 2.var2
# * <fctr> <chr> <chr> <chr> <chr>
# 1 A a aa b|c bb|cc|dd
# 2 B d|e ee f ff
当拥有数据表而不是数据框时,您可以使用相同的代码,甚至是脚本。但是,如果您正在寻找使用不同故事的数据表命令。