我有一个非常大的数据集,想要编写一个经济的数据分析代码。
以下是插图示例
df <- data.frame(
ID = factor(sample(c("A","B","C","D","E","F","G"), 20, replace=TRUE)),
a1 = runif(20),
a2 = runif(20),
a3 = runif(20),
a4 = runif(20),
b1 = runif(20),
b2 = runif(20),
b3 = runif(20),
b4 = runif(20))
我想像这样(例子)进行配对样本测试:
t.test(df$a1, df$b1, paired=TRUE, na.rm=TRUE)
t.test(df$a2, df$b2, paired=TRUE, na.rm=TRUE)
这有效,但我想要一个更短的代码并尝试过:
object_a <- paste("a", 1:4, sep="")
object_b <- paste("b", 1:4, sep="")
t.test.func.paired <- function(x) {
t.test(x, y, paired = TRUE, na.rm=TRUE)
}
df %>%
select_(.dots = c(object_a, object_b)) %>%
sapply(., t.test.func.paired) %>%
.[c("statistic", "parameter", "p.value"), ] %>%
View()
不幸的是,这不起作用。但错误在哪里? 谢谢!
答案 0 :(得分:0)
以下是使用dplyr
和broom
个套餐的流程。 Broom
会自动将t.test
结果保存在数据框中,因此您无需亲自提取各种信息。
关键是要创建所需的所有变量组合以及每个组合以运行相应的测试。请注意,这涉及按顺序列名称(如a1,a2,...,b1,b2,...)。 Dplyr
将帮助您避免每个变量组合的循环。
library(dplyr)
library(broom)
# dataset
df <- data.frame(
ID = factor(sample(c("A","B","C","D","E","F","G"), 20, replace=TRUE)),
a1 = runif(20),
a2 = runif(20),
a3 = runif(20),
a4 = runif(20),
b1 = runif(20),
b2 = runif(20),
b3 = runif(20),
b4 = runif(20))
# split dataset names based on matching
object_a = names(df)[grep("a", names(df))]
object_b = names(df)[grep("b", names(df))]
cbind(object_a, object_b) %>% # combine dataset names
data.frame(., stringsAsFactors = F) %>% # create a dataset
rowwise() %>% # for each row
do(data.frame(., # keep dataset names
tidy(t.test(df[,.$object_a], # get t.test results as a data frame based on the object names you have in that row
df[,.$object_b],
paired = T,
na.rm = T)))) %>%
ungroup # forget the grouping
# # A tibble: 4 × 10
# object_a object_b estimate statistic p.value parameter conf.low conf.high method alternative
# * <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fctr> <fctr>
# 1 a1 b1 -0.03689665 -0.5253532 0.6054150 19 -0.1838941 0.11010078 Paired t-test two.sided
# 2 a2 b2 -0.09111585 -1.2358669 0.2315703 19 -0.2454267 0.06319499 Paired t-test two.sided
# 3 a3 b3 0.07515723 0.7721983 0.4494961 19 -0.1285545 0.27886900 Paired t-test two.sided
# 4 a4 b4 0.04359102 0.4317255 0.6708003 19 -0.1677402 0.25492223 Paired t-test two.sided