假设我具有以下数据结构:
structure(list(treatment = c("DD", "DR", "RD", "RR", "DD", "DR",
"RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD",
"DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR",
"DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD",
"RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR",
"RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD",
"DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR",
"DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD",
"RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR",
"RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD",
"DR", "RD", "RR", "DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR",
"DD", "DR", "RD", "RR", "DD", "DR", "RD", "RR"), correct = c(0.428571428571429,
0.6, 0.625, 0.75, 0.757142857142857, 0.725, 0.675, 0.65, 0.971428571428571,
0.875, 0.875, 0.875, 0.442857142857143, 0.35, 0.325, 0.425, 0.942857142857143,
0.975, 0.925, 0.9, 0.171428571428571, 0.15, 0.175, 0.2375, 0.714285714285714,
0.925, 0.95, 0.825, 0.957142857142857, 0.925, 0.9, 0.9125, 0.228571428571429,
0.275, 0.275, 0.4625, 0.9, 0.8, 0.825, 0.725, 0.971428571428571,
0.9, 0.85, 0.9375, 0.885714285714286, 0.925, 0.925, 0.95, 0.857142857142857,
0.85, 0.85, 0.825, 0.857142857142857, 0.75, 0.75, 0.925, 0.942857142857143,
0.925, 0.925, 0.825, 0.871428571428571, 0.8, 0.8, 0.6375, 0.957142857142857,
0.925, 0.925, 0.85, 1, 0.925, 0.9, 0.975, 0.971428571428571,
0.925, 0.9, 0.9375, 0.9, 0.925, 0.95, 1, 0.971428571428571, 0.95,
0.95, 1, 0.914285714285714, 0.95, 0.95, 0.95, 0.614285714285714,
0.775, 0.8, 0.575, 0.428571428571429, 0.575, 0.575, 0.45, 0.2,
0.375, 0.375, 0.4625, 0.971428571428571, 0.975, 0.975, 0.975,
0.9, 0.8, 0.8, 0.8625, 0.885714285714286, 0.9, 0.85, 0.8125,
0.2, 0.275, 0.3, 0.2875, 0.671428571428571, 0.775, 0.8, 0.875,
0.971428571428571, 0.95, 0.95, 1)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -124L))
每四行代表一个给定问题的值(针对不同的组)。我希望能够为每组四行计算(DD-DR)和(RR-RD),并将它们分别存储在两个单独的列中。
我知道“ diff”命令,如果我将数据子集仅包含DD和DR以及另一个仅包含RD和RR的数据进行了子集化,它将间接得到我的信息,但是我希望有一个更明确的方法。
结果表将具有四列(“处理”,“正确”,“ DD-DR”和“ RR-RD”),而后两列将基本上代表按“问题”分组(每四行),并明确取DD和DR以及RR和RD。
答案 0 :(得分:2)
您可以创建一个id
来标识每个问题。因此,为简单起见,您可以使用spread
。之后,您可以计算DD-DR
和RR-RD
。只是回到原始格式,您可以使用gather
,但这是可选的。
library(dplyr)
library(tidyr)
df$id <- rep(1:(nrow(df)/4), each = 4)
df %>%
spread(key = treatment, value = correct) %>%
mutate(DD_DR = DD-DR,
RR_RD = RR-RD) %>%
gather(key = treatment, value = correct, -id, -DD_DR, -RR_RD) %>%
select(id, treatment, correct, DD_DR, RR_RD) %>%
arrange(id) %>%
head(10)
# A tibble: 10 x 5
id treatment correct DD_DR RR_RD
<int> <chr> <dbl> <dbl> <dbl>
1 1 DD 0.429 -0.171 0.125
2 1 DR 0.6 -0.171 0.125
3 1 RD 0.625 -0.171 0.125
4 1 RR 0.75 -0.171 0.125
5 2 DD 0.757 0.0321 -0.025
6 2 DR 0.725 0.0321 -0.025
7 2 RD 0.675 0.0321 -0.025
8 2 RR 0.65 0.0321 -0.025
9 3 DD 0.971 0.0964 0
10 3 DR 0.875 0.0964 0
答案 1 :(得分:1)
假设行始终相邻分组,并且每组总是4行,这就是您要查找的内容吗?
图书馆(tidyverse)
dat %>%
group_by(id = 1 + (row_number()-1) %/% 4) %>%
mutate(dd_less_dr =
sum(if_else(treatment == "DD", correct, 0)) -
sum(if_else(treatment == "DR", correct, 0)),
rr_less_rd =
sum(if_else(treatment == "RR", correct, 0)) -
sum(if_else(treatment == "RD", correct, 0)))
# A tibble: 124 x 5
# Groups: id [31]
treatment correct id dd_less_dr rr_less_rd
<chr> <dbl> <dbl> <dbl> <dbl>
1 DD 0.429 1 -0.171 0.125
2 DR 0.6 1 -0.171 0.125
3 RD 0.625 1 -0.171 0.125
4 RR 0.75 1 -0.171 0.125
5 DD 0.757 2 0.0321 -0.025
6 DR 0.725 2 0.0321 -0.025
7 RD 0.675 2 0.0321 -0.025
8 RR 0.65 2 0.0321 -0.025
...
答案 2 :(得分:1)
怎么样
dat %>%
mutate(group = ceiling(row_number()/4)) %>%
spread(key = treatment, value = correct) %>%
mutate(`DD-DR` = DD - DR,
`RR - RD` = RR - RD)
# A tibble: 31 x 7
group DD DR RD RR `DD-DR` `RR - RD`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.429 0.6 0.625 0.75 -0.171 0.125
2 2 0.757 0.725 0.675 0.65 0.0321 -0.025
3 3 0.971 0.875 0.875 0.875 0.0964 0
4 4 0.443 0.35 0.325 0.425 0.0929 0.100
感谢lukeA提供上限/行号代码。