我想使用dplyr
来计算受试者每次访问的两个因子的比率向量。模拟数据可以在下面创建:
subj = c(rep("A", 10), rep("B", 4), rep("C", 6))
vist = c(rep(c("C0", "C1", "C2", "C3", "C4"), each=2),
rep(c("C0", "C1"), each=2),
rep(c("C0", "C1", "C2"), each=2))
factor = c(rep(c("L", "N"), 5), rep(c("L", "N"), 2), rep(c("L", "N"), 3))
set.seed(111)
aval = round(rnorm(n = 20, 0, 1), 2)
dat = data.frame(subj, vist, factor, aval, stringsAsFactors = FALSE)
dat
看起来像:
subj vist factor aval
1 A C0 L 0.24
2 A C0 N -0.33
3 A C1 L -0.31
4 A C1 N -2.30
5 A C2 L -0.17
6 A C2 N 0.14
7 A C3 L -1.50
8 A C3 N -1.01
9 A C4 L -0.95
10 A C4 N -0.49
11 B C0 L -0.17
12 B C0 N -0.41
13 B C1 L 1.85
14 B C1 N 0.39
15 C C0 L 0.80
16 C C0 N -1.57
17 C C1 L -0.09
18 C C1 N -0.36
19 C C2 L -1.19
20 C C2 N 0.36
对于每次访问的每个主题(aval
),需要的是因子(factor
)“N”超过“L”的值(subj
)的比率(vist
) -1.375
)。例如,第一个比率值为-0.33/0.24
,来自/**
* @ORM\Entity
* @ORM\Table(name="property_addition_cost_frequency")
*/
class PropertyAdditionCostFrequency
{
/**
* @ORM\Id
* @ORM\GeneratedValue(strategy="AUTO")
* @ORM\Column(type="integer")
*/private $id;
/**
* @ORM\Column(type="string")
*/
private $label;
/**
* @ORM\Column(type="string")
*/
private $group;
/**
* @ORM\Column(type="string")
*/
private $active;
//all getters and setters
}
。谢谢!
答案 0 :(得分:3)
您可以使用tidyr包中的spread
重新整理数据,然后很容易计算新列:
library(tidyr)
library(dplyr)
dat %>%
spread(factor, aval) %>%
mutate(ratio = N/L)
subj vist L N ratio
1 A C0 0.24 -0.33 -1.3750000
2 A C1 -0.31 -2.30 7.4193548
3 A C2 -0.17 0.14 -0.8235294
4 A C3 -1.50 -1.01 0.6733333
5 A C4 -0.95 -0.49 0.5157895
6 B C0 -0.17 -0.41 2.4117647
7 B C1 1.85 0.39 0.2108108
8 C C0 0.80 -1.57 -1.9625000
9 C C1 -0.09 -0.36 4.0000000
10 C C2 -1.19 0.36 -0.3025210
答案 1 :(得分:1)
如果每个群组只有一个N
和L
,您可以这样做:
dat %>%
group_by(subj, vist) %>%
summarise(ratio = aval[factor == "N"]/aval[factor == "L"])
#Source: local data frame [10 x 3]
#Groups: subj [?]
# subj vist ratio
# <chr> <chr> <dbl>
#1 A C0 -1.3750000
#2 A C1 7.4193548
#3 A C2 -0.8235294
#4 A C3 0.6733333
#5 A C4 0.5157895
#6 B C0 2.4117647
#7 B C1 0.2108108
#8 C C0 -1.9625000
#9 C C1 4.0000000
#10 C C2 -0.3025210
答案 2 :(得分:1)
在基础R中,您可以使用aggregate
构建比率摘要,或使用ave
将这些比率填入原始data.frame。这假定data.frame是正规的并且正确排序。
aggregate(dat$aval, dat[c("subj", "vist")], FUN=function(x) x[2] / x[1])
subj vist x
1 A C0 -1.3750000
2 B C0 2.4117647
3 C C0 -1.9625000
4 A C1 7.4193548
5 B C1 0.2108108
6 C C1 4.0000000
7 A C2 -0.8235294
8 C C2 -0.3025210
9 A C3 0.6733333
10 A C4 0.5157895
或
dat$rat <- ave(dat$aval, dat$subj, dat$vist, FUN=function(x) x[2] / x[1])
将其添加为变量。
答案 3 :(得分:1)
如果它们的顺序相同且每个只有一对&#39; subj&#39;,&#39; vist&#39;
dat$ratio <- rep(dat$aval[c(FALSE, TRUE)]/dat$aval[c( TRUE, FALSE)], each = 2)
dat$ratio
#[1] -1.3750000 -1.3750000 7.4193548 7.4193548 -0.8235294 -0.8235294
#[7] 0.6733333 0.6733333 0.5157895 0.5157895 2.4117647 2.4117647
#[13] 0.2108108 0.2108108 -1.9625000 -1.9625000 4.0000000 4.0000000
#[19] -0.3025210 -0.3025210