Question

我想在R中的两个数据帧之间进行逐列测试。即ttest(df1$col1,df2$col1)，ttest(df1$col2,df2$col2)等......这里最好的选择是使用{{1 }或mapply函数。类似的东西：

Map

完美地工作但是如果你的一个df列有NA，它会因为这个错误而失败：

mapply(t.test,tnav_DJF_histo.csv[,-1],tnav_DJF.csv[,-1])

问题：如何使用Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) : not enough 'y' observations完成工作？例如，如果tnav_DJF.csv [， - 1]中的列在tnav_DJF_histo.csv [， - 1]中有Nas但没有NA，我如何告诉na.rm忽略或跳过这些列的分析？

非常感谢。

AEZ。

Answer 1

您可以使用mapply和匿名函数执行此操作，如下所示：

示例数据：

df1 <- data.frame(a=runif(20), b=runif(20), c=rep(NA,20))
df2 <- data.frame(a=runif(20), b=runif(20), c=c(NA,1:18,NA))
#notice df1's third column is just NAs

解决方案：

将mapply与匿名函数一起使用，如下所示：

#anonumous function testing for NAs
mapply(function(x, y) {
  if(all(is.na(x)) || all(is.na(y))) NULL else t.test(x, y, na.action=na.omit)
  }, df1, df2)

输出：

$a

    Welch Two Sample t-test

data:  x and y
t = 1.4757, df = 37.337, p-value = 0.1484
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.0543192  0.3458648
sample estimates:
mean of x mean of y 
0.5217619 0.3759890 


$b

    Welch Two Sample t-test

data:  x and y
t = 1.1689, df = 37.7, p-value = 0.2498
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.0815067  0.3041051
sample estimates:
mean of x mean of y 
0.5846343 0.4733351 


$c
NULL

P.S。 na.rm函数中没有t.test参数可供使用。只有na.action参数，但即使您将其设置为na.omit（我有），如果所有列元素都是NA，您仍会收到错误。

P.S.2如果x或y的某些元素是NA，那么t.test函数将通过省略这些元素而正确运行。如果您想忽略计算t.test，如果任何列包含一个NA，那么您需要将上述函数中的all更改为any。

Answer 2

你能做点什么吗

t.test2 <- function(col1, col2){
  df <- complete.cases(cbind(col1, col2))
  if(nrow(df) < 3){return(NA)}
  t.test(df[, 1], df[, 2], na.rm = TRUE)
  }
mapply(t.test2, csv1[, -1], csv2[, -2])

在R中使用lapply进行t测试时删除NA

2 个答案: