我有以下数据结构,
date <- as.Date(as.character( c("2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-13",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-14",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15",
"2015-02-15")))
name <- c("John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas",
"John","Michael","Thomas")
drinks <-c("Beer","Coffee","Tee",
"Tee","Beer", "Coffee",
"Coffee","Tee","Beer",
"Beer","Coffee","Tee",
"Tee","Beer", "Coffee",
"Coffee","Tee","Beer",
"Beer","Coffee","Tee",
"Tee","Beer", "Coffee",
"Coffee","Tee","Beer")
consumed <- c(3,2,5,3,6,2,9,4,5,
1,3,5,8,0,1,2,3,5,
1,24,4,5,7,9,9,1,2)
version_1 <- data.frame(date,name,drinks,consumed)
除了消耗之外,我的第二个数据帧几乎完全相同:
consumed <- c(10,9,1,20,30,1,50,40,20,
10,2,10,2,1,1,2,3,5,
20,24,1,40,2,8,4,0,0)
version_2 <- data.frame(date,name,drinks,consumed)
version_1$version <- rep("one", nrow(version_1))
version_2$version <- rep("two", nrow(version_2))
all <- rbind(version_1, version_2)
all$version <- as.factor(all$version)
date name drinks consumed version
1 2015-02-13 John Beer 3 one
2 2015-02-13 Michael Coffee 2 one
3 2015-02-13 Thomas Tee 5 one
4 2015-02-13 John Tee 3 one
5 2015-02-13 Michael Beer 6 one
6 2015-02-13 Thomas Coffee 2 one
7 2015-02-13 John Coffee 9 one
8 2015-02-13 Michael Tee 4 one
9 2015-02-13 Thomas Beer 5 one
10 2015-02-14 John Beer 1 one
11 2015-02-14 Michael Coffee 3 one
12 2015-02-14 Thomas Tee 5 one
13 2015-02-14 John Tee 8 one
14 2015-02-14 Michael Beer 0 one
15 2015-02-14 Thomas Coffee 1 one
16 2015-02-14 John Coffee 2 one
17 2015-02-14 Michael Tee 3 one
18 2015-02-14 Thomas Beer 5 one
19 2015-02-15 John Beer 1 one
20 2015-02-15 Michael Coffee 24 one
21 2015-02-15 Thomas Tee 4 one
22 2015-02-15 John Tee 5 one
23 2015-02-15 Michael Beer 7 one
24 2015-02-15 Thomas Coffee 9 one
25 2015-02-15 John Coffee 9 one
26 2015-02-15 Michael Tee 1 one
27 2015-02-15 Thomas Beer 2 one
28 2015-02-13 John Beer 10 two
29 2015-02-13 Michael Coffee 9 two
30 2015-02-13 Thomas Tee 1 two
31 2015-02-13 John Tee 20 two
32 2015-02-13 Michael Beer 30 two
33 2015-02-13 Thomas Coffee 1 two
34 2015-02-13 John Coffee 50 two
35 2015-02-13 Michael Tee 40 two
36 2015-02-13 Thomas Beer 20 two
37 2015-02-14 John Beer 10 two
38 2015-02-14 Michael Coffee 2 two
39 2015-02-14 Thomas Tee 10 two
40 2015-02-14 John Tee 2 two
41 2015-02-14 Michael Beer 1 two
42 2015-02-14 Thomas Coffee 1 two
43 2015-02-14 John Coffee 2 two
44 2015-02-14 Michael Tee 3 two
45 2015-02-14 Thomas Beer 5 two
46 2015-02-15 John Beer 20 two
47 2015-02-15 Michael Coffee 24 two
48 2015-02-15 Thomas Tee 1 two
49 2015-02-15 John Tee 40 two
50 2015-02-15 Michael Beer 2 two
51 2015-02-15 Thomas Coffee 8 two
52 2015-02-15 John Coffee 4 two
53 2015-02-15 Michael Tee 0 two
54 2015-02-15 Thomas Beer 0 two
我想循环数据框并测试组差异(一对二)差异。每天都有一个独特的名称和饮料组合。因此,我想测试一下:
2015-02-13 John Beer 3 one 2015-02-14 John Beer 1一 2015-02-15 John Beer 1一
与
2015-02-13 John Beer 10二 2015-02-14 John Beer 10二 2015-02-15 John Beer 20两个
以及每个日期,名称和饮料组对。
我无法弄清楚如何实现这一目标:
for (i in 1:length(date)){
temp <- all[all$date==date[i],]
}
答案 0 :(得分:2)
使用data.table
:
library(data.table)
setDT(all)
all[, t.test(consumed[version == "one"], consumed[version == "two"]), by = .(name,drinks)]
name drinks statistic parameter p.value conf.int estimate null.value alternative method data.name
1: John Beer -3.4320324 2.159744 0.06761534 -25.303554 1.666667 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
2: John Beer -3.4320324 2.159744 0.06761534 1.970221 13.333333 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
3: Michael Coffee -0.2067737 3.960582 0.84638132 -28.960658 9.666667 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
4: Michael Coffee -0.2067737 3.960582 0.84638132 24.960658 11.666667 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
5: Thomas Tee 0.2208631 2.049375 0.84525800 -12.025434 4.666667 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
6: Thomas Tee 0.2208631 2.049375 0.84525800 13.358768 4.000000 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
7: John Tee -1.3850647 2.070089 0.29640280 -61.453187 5.333333 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
8: John Tee -1.3850647 2.070089 0.29640280 30.786521 20.666667 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
9: Michael Beer -0.6835859 2.210972 0.55885626 -45.015433 4.333333 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
10: Michael Beer -0.6835859 2.210972 0.55885626 31.682100 11.000000 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
11: Thomas Coffee 0.1942572 3.977345 0.85549254 -8.883193 4.000000 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
12: Thomas Coffee 0.1942572 3.977345 0.85549254 10.216527 3.333333 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
13: John Coffee -0.7570982 2.088564 0.52510317 -77.499374 6.666667 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
14: John Coffee -0.7570982 2.088564 0.52510317 53.499374 18.666667 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
15: Michael Tee -0.9049035 2.018804 0.46026242 -66.647341 2.666667 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
16: Michael Tee -0.9049035 2.018804 0.46026242 43.314008 14.333333 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
17: Thomas Beer -0.7113284 2.110684 0.54726281 -29.270500 4.000000 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
18: Thomas Beer -0.7113284 2.110684 0.54726281 20.603833 8.333333 0 two.sided Welch Two Sample t-test consumed[version == "one"] and consumed[version == "two"]
这对两个组(consumed[version == "one"], consumed[version == "two"]
)进行了t.test,按组(by = .(name,drinks)
)
结果有两行的原因是您的置信区间+估计值返回两个值。所有其他列都会重复。
我们可以通过包装list(...)
将结果存储在data.table中作为列表来避免这种情况:
result <- all[, .(ttest = list(t.test(consumed[version == "one"], consumed[version == "two"]))), by = .(name,drinks)]
result
name drinks ttest
1: John Beer <htest>
2: Michael Coffee <htest>
3: Thomas Tee <htest>
4: John Tee <htest>
5: Michael Beer <htest>
6: Thomas Coffee <htest>
7: John Coffee <htest>
8: Michael Tee <htest>
9: Thomas Beer <htest>
然后我们可以用:
调用结果result[name == "John" & drinks == "Beer", ttest]
[[1]]
Welch Two Sample t-test
data: consumed[version == "one"] and consumed[version == "two"]
t = -3.432, df = 2.1597, p-value = 0.06762
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-25.303554 1.970221
sample estimates:
mean of x mean of y
1.666667 13.333333