在我的数据集中,我有几个月的数据。
df=structure(list(id = c(1030879980L, 1030879990L), jan = c(170L,
265L), feb = c(153L, 332L), march = c(170L, 290L), apr = c(1L,
425L), may = c(66L, 406L), jume = c(125L, 352L), jul = c(129L,
339L), aug = c(-109L, 470L), sept = c(56L, 486L), oct = c(37L,
440L), nov = c(52L, 589L), dec = c(63L, 659L)), .Names = c("id",
"jan", "feb", "march", "apr", "may", "jume", "jul", "aug", "sept",
"oct", "nov", "dec"), class = "data.frame", row.names = c(NA,
-2L))
我必须为每个id执行一个样本学生的t检验 使用参考值
这里有参考值的数据
ref=structure(list(jan = 507L, feb = 502L, march = 431L, apr = 429L,
may = 449L, jume = 368L, jul = 406L, aug = 290L, sept = 309L,
oct = 371L, nov = 481L, dec = 536L), .Names = c("jan", "feb",
"march", "apr", "may", "jume", "jul", "aug", "sept", "oct", "nov",
"dec"), class = "data.frame", row.names = c(NA, -1L))
所以我这么简单
#the first id 1030879980
a = c(170,153,170,1,66,125,129,-109,56,37,52,63)
#jan reference values for january
t.test (a, mu=507)
#feb reference values for febrary
t.test (a, mu=502)
但是我怎么能按月为每个id执行它? 当然,我(手动),它会很长。这里有很多内容。
答案 0 :(得分:2)
您可以执行以下操作。将结果存储在结果列表中。循环遍历id,存储t.tests并将名称命名为id' s。
result <- vector("list", nrow(df))
for(i in seq_along(df$id)) {
result[[i]] <- t(sapply(t(ref), function (x) t.test(df[i, -1], mu=x, data.name = i)))
}
names(result) <- df$id
result
$`1030879980`
statistic parameter p.value conf.int estimate null.value alternative method data.name
[1,] -18.52462 11 1.214017e-09 Numeric,2 76.08333 507 "two.sided" "One Sample t-test" "df[i, -1]"
[2,] -18.30968 11 1.375191e-09 Numeric,2 76.08333 502 "two.sided" "One Sample t-test" "df[i, -1]"
[3,] -15.25747 11 9.526113e-09 Numeric,2 76.08333 431 "two.sided" "One Sample t-test" "df[i, -1]"
[4,] -15.17149 11 1.01107e-08 Numeric,2 76.08333 429 "two.sided" "One Sample t-test" "df[i, -1]"
[5,] -16.03127 11 5.649016e-09 Numeric,2 76.08333 449 "two.sided" "One Sample t-test" "df[i, -1]"
[6,] -12.54917 11 7.334351e-08 Numeric,2 76.08333 368 "two.sided" "One Sample t-test" "df[i, -1]"
[7,] -14.18274 11 2.05238e-08 Numeric,2 76.08333 406 "two.sided" "One Sample t-test" "df[i, -1]"
[8,] -9.196035 11 1.696865e-06 Numeric,2 76.08333 290 "two.sided" "One Sample t-test" "df[i, -1]"
[9,] -10.01282 11 7.298909e-07 Numeric,2 76.08333 309 "two.sided" "One Sample t-test" "df[i, -1]"
[10,] -12.67813 11 6.598342e-08 Numeric,2 76.08333 371 "two.sided" "One Sample t-test" "df[i, -1]"
[11,] -17.40691 11 2.357586e-09 Numeric,2 76.08333 481 "two.sided" "One Sample t-test" "df[i, -1]"
[12,] -19.7713 11 6.045928e-10 Numeric,2 76.08333 536 "two.sided" "One Sample t-test" "df[i, -1]"
$`1030879990`
statistic parameter p.value conf.int estimate null.value alternative method data.name
[1,] -2.523598 11 0.02829384 Numeric,2 421.0833 507 "two.sided" "One Sample t-test" "df[i, -1]"
[2,] -2.376735 11 0.03671235 Numeric,2 421.0833 502 "two.sided" "One Sample t-test" "df[i, -1]"
[3,] -0.2912785 11 0.7762574 Numeric,2 421.0833 431 "two.sided" "One Sample t-test" "df[i, -1]"
[4,] -0.2325333 11 0.8203937 Numeric,2 421.0833 429 "two.sided" "One Sample t-test" "df[i, -1]"
[5,] -0.8199858 11 0.4296343 Numeric,2 421.0833 449 "two.sided" "One Sample t-test" "df[i, -1]"
[6,] 1.559197 11 0.1472385 Numeric,2 421.0833 368 "two.sided" "One Sample t-test" "df[i, -1]"
[7,] 0.4430371 11 0.6663254 Numeric,2 421.0833 406 "two.sided" "One Sample t-test" "df[i, -1]"
[8,] 3.850262 11 0.00269828 Numeric,2 421.0833 290 "two.sided" "One Sample t-test" "df[i, -1]"
[9,] 3.292182 11 0.007176877 Numeric,2 421.0833 309 "two.sided" "One Sample t-test" "df[i, -1]"
[10,] 1.471079 11 0.1692895 Numeric,2 421.0833 371 "two.sided" "One Sample t-test" "df[i, -1]"
[11,] -1.75991 11 0.106165 Numeric,2 421.0833 481 "two.sided" "One Sample t-test" "df[i, -1]"
[12,] -3.375404 11 0.006192718 Numeric,2 421.0833 536 "two.sided" "One Sample t-test" "df[i, -1]"
答案 1 :(得分:2)
这是宽数据集挑战的一个很好的例子,其中月份值是单独的列而不是指标中的值,月,与其相邻的相应数值的字段。保持数据长,大多数操作(包括t.test
跨相关数据都更容易运行。
因此,作为替代方案,请考虑重新整形,然后为每个 ID 运行by
,在12个月 mu 值中迭代调用t.test
:
reshape_df <- reshape(df, varying = names(df)[-1], idvar="id", v.names="value",
times = names(df)[-1], timevar="month",
new.row.names = 1:1000, direction = "long")
reshape_ref <- reshape(ref, varying = names(ref), v.names="mu",
times = names(ref), timevar="month",
new.row.names = 1:1000, direction = "long")
ttest_list <- by(reshape_df, reshape_df$id, function(sub)
do.call(rbind, lapply(reshape_ref$mu, function(x) t.test(sub$value, mu=x))))
<强>输出强>
ttest_list$`1030879980`
statistic parameter p.value conf.int estimate null.value alternative method data.name
[1,] -18.52462 11 1.214017e-09 Numeric,2 76.08333 507 "two.sided" "One Sample t-test" "sub$value"
[2,] -18.30968 11 1.375191e-09 Numeric,2 76.08333 502 "two.sided" "One Sample t-test" "sub$value"
[3,] -15.25747 11 9.526113e-09 Numeric,2 76.08333 431 "two.sided" "One Sample t-test" "sub$value"
[4,] -15.17149 11 1.01107e-08 Numeric,2 76.08333 429 "two.sided" "One Sample t-test" "sub$value"
[5,] -16.03127 11 5.649016e-09 Numeric,2 76.08333 449 "two.sided" "One Sample t-test" "sub$value"
[6,] -12.54917 11 7.334351e-08 Numeric,2 76.08333 368 "two.sided" "One Sample t-test" "sub$value"
[7,] -14.18274 11 2.05238e-08 Numeric,2 76.08333 406 "two.sided" "One Sample t-test" "sub$value"
[8,] -9.196035 11 1.696865e-06 Numeric,2 76.08333 290 "two.sided" "One Sample t-test" "sub$value"
[9,] -10.01282 11 7.298909e-07 Numeric,2 76.08333 309 "two.sided" "One Sample t-test" "sub$value"
[10,] -12.67813 11 6.598342e-08 Numeric,2 76.08333 371 "two.sided" "One Sample t-test" "sub$value"
[11,] -17.40691 11 2.357586e-09 Numeric,2 76.08333 481 "two.sided" "One Sample t-test" "sub$value"
[12,] -19.7713 11 6.045928e-10 Numeric,2 76.08333 536 "two.sided" "One Sample t-test" "sub$value"
ttest_list$`1030879990`
statistic parameter p.value conf.int estimate null.value alternative method data.name
[1,] -2.523598 11 0.02829384 Numeric,2 421.0833 507 "two.sided" "One Sample t-test" "sub$value"
[2,] -2.376735 11 0.03671235 Numeric,2 421.0833 502 "two.sided" "One Sample t-test" "sub$value"
[3,] -0.2912785 11 0.7762574 Numeric,2 421.0833 431 "two.sided" "One Sample t-test" "sub$value"
[4,] -0.2325333 11 0.8203937 Numeric,2 421.0833 429 "two.sided" "One Sample t-test" "sub$value"
[5,] -0.8199858 11 0.4296343 Numeric,2 421.0833 449 "two.sided" "One Sample t-test" "sub$value"
[6,] 1.559197 11 0.1472385 Numeric,2 421.0833 368 "two.sided" "One Sample t-test" "sub$value"
[7,] 0.4430371 11 0.6663254 Numeric,2 421.0833 406 "two.sided" "One Sample t-test" "sub$value"
[8,] 3.850262 11 0.00269828 Numeric,2 421.0833 290 "two.sided" "One Sample t-test" "sub$value"
[9,] 3.292182 11 0.007176877 Numeric,2 421.0833 309 "two.sided" "One Sample t-test" "sub$value"
[10,] 1.471079 11 0.1692895 Numeric,2 421.0833 371 "two.sided" "One Sample t-test" "sub$value"
[11,] -1.75991 11 0.106165 Numeric,2 421.0833 481 "two.sided" "One Sample t-test" "sub$value"
[12,] -3.375404 11 0.006192718 Numeric,2 421.0833 536 "two.sided" "One Sample t-test" "sub$value"