使用大量连续的几列变量重新组织分组数据框

时间:2016-03-16 14:25:55

标签: r group-by dplyr reshape melt

我正在使用类似于此的数据框:

set.seed(1)
test.df <- data.frame(Treatment= "CI",
                  period = seq(1, 3),
                  subject= 1,
                  X.1. = rnorm(6),
                  X.2. = rnorm(6),
                  X.3. = rnorm(6),
                  Y.1. = rnorm(6),
                  Y.2. = rnorm(6),
                  Y.3. = rnorm(6))
> test.df
  Treatment period subject       X.1.       X.2.        X.3.        Y.1.        Y.2.        Y.3.
1        CI      1       1 -0.6264538  0.4874291 -0.62124058  0.82122120  0.61982575  1.35867955
2        CI      2       1  0.1836433  0.7383247 -2.21469989  0.59390132 -0.05612874 -0.10278773
3        CI      3       1 -0.8356286  0.5757814  1.12493092  0.91897737 -0.15579551  0.38767161
4        CI      1       1  1.5952808 -0.3053884 -0.04493361  0.78213630 -1.47075238 -0.05380504
5        CI      2       1  0.3295078  1.5117812 -0.01619026  0.07456498 -0.47815006 -1.37705956
6        CI      3       1 -0.8204684  0.3898432  0.94383621 -1.98935170  0.41794156 -0.41499456

我希望我的数据如下所示:

  Treatment period subject Game           X           Y
1        CI      1       1    1  -0.6264538  0.82122120

其中游戏是从1:3开始的,并且已经为每组c(治疗,期间)完成了这项工作。但在实际数据中,除了X和Y之外,还有大约16个其他类似的变量。受this帖子的启发,我尝试按以下方式进行:

final.df<- test.df %>% 
group_by(Treatment, period) %>%
reshape(idvar=1:3, varying=4:ncol(test.df), sep=".", direction='long',times=1:3)

我收到以下错误

Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L],
  : invalid 'row.names' length

2 个答案:

答案 0 :(得分:2)

修复名称后,您仍然会遇到两个问题:(1)您当前的ID不是唯一的; (2)tbl_df不喜欢names(test.df) <- toupper(names(test.df)) test.df %>% group_by(TREATMENT, PERIOD) %>% mutate(NEW_ID = sequence(n())) %>% data.frame %>% reshape(varying = grep("^X\\.|^Y\\.", names(test.df)), sep=".", direction = "long") # TREATMENT PERIOD SUBJECT NEW_ID time X Y id # 1.1 CI 1 1 1 1 0.898037653 -1.62380441 1 # 2.1 CI 2 1 1 1 -0.265867132 1.19260758 2 # 3.1 CI 3 1 1 1 0.478254223 0.37225231 3 # 4.1 CI 1 1 2 1 0.193781526 0.78440441 4 # 5.1 CI 2 1 2 1 -0.785203396 -0.88621250 5 # 6.1 CI 3 1 2 1 0.341740150 -0.67919816 6 # 1.2 CI 1 1 1 2 -1.808196090 1.64211603 1 # 2.2 CI 2 1 1 2 -0.937445606 -0.35388758 2 # 3.2 CI 3 1 1 2 1.773354124 0.95633070 3 # 4.2 CI 1 1 2 2 -0.819681242 1.06421615 4 # 5.2 CI 2 1 2 2 0.003812118 -0.04835364 5 # 6.2 CI 3 1 2 2 0.226081490 0.50687855 6 # 1.3 CI 1 1 1 3 0.822497674 -0.55875020 1 # 2.3 CI 2 1 1 3 0.382695603 -0.83661977 2 # 3.3 CI 3 1 1 3 0.066738811 -1.96761492 3 # 4.3 CI 1 1 2 3 0.854280148 -0.49335882 4 # 5.3 CI 2 1 2 3 -1.635859887 1.18322984 5 # 6.3 CI 3 1 2 3 -0.020864680 1.20997470 6 。因此,如果您坚持使用“dplyr”,则需要执行以下操作:

melt

但是,我建议您从“data.table”查看library(data.table) melt(as.data.table(setnames(test.df, toupper(names(test.df)))), measure.vars = patterns("^X\\.", "^Y\\."), value.name = c("X", "Y")) # TREATMENT PERIOD SUBJECT variable X Y # 1: CI 1 1 1 0.898037653 -1.62380441 # 2: CI 2 1 1 -0.265867132 1.19260758 # 3: CI 3 1 1 0.478254223 0.37225231 # 4: CI 1 1 1 0.193781526 0.78440441 # 5: CI 2 1 1 -0.785203396 -0.88621250 # 6: CI 3 1 1 0.341740150 -0.67919816 # 7: CI 1 1 2 -1.808196090 1.64211603 # 8: CI 2 1 2 -0.937445606 -0.35388758 # 9: CI 3 1 2 1.773354124 0.95633070 # 10: CI 1 1 2 -0.819681242 1.06421615 # 11: CI 2 1 2 0.003812118 -0.04835364 # 12: CI 3 1 2 0.226081490 0.50687855 # 13: CI 1 1 3 0.822497674 -0.55875020 # 14: CI 2 1 3 0.382695603 -0.83661977 # 15: CI 3 1 3 0.066738811 -1.96761492 # 16: CI 1 1 3 0.854280148 -0.49335882 # 17: CI 2 1 3 -1.635859887 1.18322984 # 18: CI 3 1 3 -0.020864680 1.20997470

webkit

答案 1 :(得分:0)

df.stk <- tidyr::gather(test.df, "xy", "value", -Treatment:-subject)
    df.sep <- tidyr::separate(df.stk, "xy", c("xy", "Game", "temp"), sep="\\.")[, -6]
    df.final <- reshape2::dcast(df.sep, Treatment + period + subject + Game ~ xy, fun.aggregate=mean)

但您需要将所有X和Y名称设为小写或大写