我的数据集看起来像这样:
df <- data.frame(compound = c("alanine ", "arginine", "asparagine", "aspartate"))
df <- matrix(rnorm(12*4), ncol = 12)
colnames(df) <- c("AC-1", "AC-2", "AC-3", "AM-1", "AM-2", "AM-3", "SC-1", "SC-2", "SC-3", "SM-1", "SM-2", "SM-3")
df <- data.frame(compound = c("alanine ", "arginine", "asparagine", "aspartate"), df)
df
compound AC.1 AC.2 AC.3 AM.1 AM.2 AM.3 SC.1 SC.2 SC.3 SM.1
1 alanine 1.18362683 -2.03779314 -0.7217692 -1.7569264 -0.8381042 0.06866567 0.2327702 -1.1558879 1.2077454 0.437707310
2 arginine -0.19610110 0.05361113 0.6478384 -0.1768597 0.5905398 -0.67945600 -0.2221109 1.4032349 0.2387620 0.598236199
3 asparagine 0.02540509 0.47880021 -0.1395198 0.8394257 1.9046667 0.31175358 -0.5626059 0.3596091 -1.0963363 -1.004673116
4 aspartate -1.36397906 0.91380826 2.0630076 -0.6817453 -0.2713498 -2.01074098 1.4619707 -0.7257269 0.2851122 -0.007027878
我想对列[1:3]和[4:6]上的每一行(化合物)执行t检验,并存储所有p值。基本上看看每种化合物的AC组和AM组之间是否存在差异。
我知道还有另一个话题,但是我找不到解决问题的可行方法。
PS。我的真实数据集大约有35000行(也许它需要的解决方案不同于4行)
答案 0 :(得分:1)
在选择了感兴趣的列之后,使用var emptyBrackets = [].toString();
// conversion to string.
console.log(emptyBrackets === '');
// conversion to number.
console.log(+emptyBrackets);
// conversion to boolean.
console.log(!0);
通过将前3个和后3个观测值选择为pmap
和t.test
的输入,在每行上应用t.test
提取的“ p值”作为原始数据的另一列
bind
或者在选择了列之后,执行library(tidyverse)
df %>%
select(AC.1:AM.3) %>%
pmap_dbl(~ c(...) %>%
{t.test(.[1:3], .[4:6])$p.value}) %>%
bind_cols(df, pval_AC_AM = .)
转换为'long'格式gather
,在spread
中应用t.test
并与原始数据结合在一起< / p>
summarise
如果在某些情况下只有唯一的值,则df %>%
select(compound, AC.1:AM.3) %>%
gather(key, val, -compound) %>%
separate(key, into = c('key1', 'key2')) %>%
spread(key1, val) %>%
group_by(compound) %>%
summarise(pval_AC_AM = t.test(AC, AM)$p.value) %>%
right_join(df)
会显示错误。一种选择是运行t.test
并获得这些情况下的NA。可以使用t.test
possibly