Question

我有一个数据框，它总结了传递给它的数据帧中丢失和未丢失观察的数量[1]。然后我被要求在我拥有的数据中测试两个治疗组之间的差异（我个人不同意这样做的需要或效用，但我已被要求做的事情）。所以我写了一个小函数来做这个......

quick.test <- function(x, y){
  chisq   <- chisq.test(x = x,  y = y)
  fisher  <- fisher.test(x = x, y = y)
  results <- cbind(chisq  = chisq$statistic,
                   df     = chisq$parameter,
                   p      = chisq$p.value,
                   fisher = fisher$p.value)
  results
}

然后我使用apply（）将相关列传递给此函数，如下所示......

apply(miss.t1, 1, function(x) quick.test(x[2:3], x[4:5]))

这适用于上面指定的miss.t1数据框，但是我正在使用时间序列数据，并且有三个我希望总结的时间点，所以有miss.t2和miss.t3（每个都是总结每个时间点的当前/缺失数据的数量，并使用[1]中描述的功能以相同的方式创建。

miss.t2因以下错误而失败...

apply(miss.t2, 1, function(x) quick.test(x[2:3], x[4:5]))
Error in chisq.test(x = x, y = y) : 
  'x' and 'y' must have at least 2 levels

我最初的想法是，其中一个列由于某种原因而缺少值，但似乎并非如此......

> describe(miss.t2)
miss.t2 

 5  Variables      171  Observations
--------------------------------------------------------------------------------
variable 
      n missing  unique 
    171       0     171 

lowest : Abtotal   Abyn      agg_ment  agg_phys  All.score
highest: z_pf      z_re      z_rp      z_sf      z_vt      
--------------------------------------------------------------------------------
nmiss.1 
      n missing  unique    Mean 
    171       0       4   8.649 

0 (6, 4%), 8 (9, 5%), 9 (153, 89%), 10 (3, 2%) 
--------------------------------------------------------------------------------
npresent.1 
      n missing  unique    Mean 
    171       0       4   9.351 

8 (3, 2%), 9 (153, 89%), 10 (9, 5%), 18 (6, 4%) 
--------------------------------------------------------------------------------
nmiss.2 
      n missing  unique    Mean 
    171       0       4   10.65 

0 (6, 4%), 11 (160, 94%), 12 (4, 2%), 13 (1, 1%) 
--------------------------------------------------------------------------------
npresent.2 
      n missing  unique    Mean 
    171       0       4   14.35 

12 (1, 1%), 13 (4, 2%), 14 (160, 94%), 25 (6, 4%) 
--------------------------------------------------------------------------------

我接下来尝试的是尝试使用miss.t2的子集（miss.t2，n = XX），它可以正常运行到第54行...

> apply(head(miss.t2, n=53), 1, function(x) quick.test(x[2:3], x[4:5]))
     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[1,] 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
[2,] 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[3,] 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[4,] 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
     29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
[1,]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
[2,]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[3,]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[4,]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
There were 50 or more warnings (use warnings() to see the first 50)
> apply(head(miss.t2, n=54), 1, function(x) quick.test(x[2:3], x[4:5]))
Error in chisq.test(x = x, y = y) : 
  'x' and 'y' must have at least 2 levels
> miss.t2[54,]
   variable nmiss.1 npresent.1 nmiss.2 npresent.2
54      psq      10          8      11         14
> traceback()
5: stop("'x' and 'y' must have at least 2 levels") at #2
4: chisq.test(x = x, y = y) at #2
3: quick.test(x[2:3], x[4:5])
2: FUN(newX[, i], ...)
1: apply(head(miss.t2, n = 54), 1, function(x) quick.test(x[2:3], 
       x[4:5]))

与数据框的“底部”类似，最后26行被解析得很好，但不是最后的第27行......

> apply(tail(miss.t2, n=26), 1, function(x) quick.test(x[2:3], x[4:5]))
     146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
[1,]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[2,]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[3,]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
[4,]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
     164 165 166 167 168 169 170 171
[1,]   0   0   0   0   0   0   0   0
[2,]   1   1   1   1   1   1   1   1
[3,]   1   1   1   1   1   1   1   1
[4,]   1   1   1   1   1   1   1   1
There were 26 warnings (use warnings() to see them)
> apply(tail(miss.t2, n=27), 1, function(x) quick.test(x[2:3], x[4:5]))
Error in chisq.test(x = x, y = y) : 
  'x' and 'y' must have at least 2 levels
In addition: Warning message:
In chisq.test(x = x, y = y) : Chi-squared approximation may be incorrect

> miss.t2[118,]
    variable nmiss.1 npresent.1 nmiss.2 npresent.2
118     sf16       9          9      11         14

我看不出这两行有什么问题，这意味着它们应该失败，上面显示的traceback（）并没有显示任何有用的东西（在我看来）。

任何人都可以就出现问题的原因或地点提出任何建议吗？

非常感谢，

尼尔

编辑：对Vincent Zoonekynd的格式化回复...

我选择了？chisq.test（）中描述的chisq.test（x = x，y = y）版本，使用cbind（），因为你建议生成一个矩阵结果 sum（x）中的错误：参数的'type'（字符）无效。

放置print语句并显示x和y的长度会导致相同的错误，但会将值和长度显示为......

> miss.t2.res <- data.frame(t(apply(miss.t2, 1, function(x) quick.test(x[2:3], x[4:5])))) 
[1] "Your x is : 9" "Your x is : 9" 
[1] 2    ### < Length of x
[1] "Your y is : 11" "Your y is : 14"
[1] 2    ### < Length of y
Error in chisq.test(x = x, y = y) : 'x' and 'y' must have at least 2 levels

编辑2：感谢Vincent Zoonekynd提示，问题是因为两个单元格的计数相同，所以对chisq.test（）的调用会将这些视为因子并将其折叠。解决方案是修改quick.test（）函数并强制传递给矩阵的参数，所以现在有效的函数....

quick.test <- function(x, y){
  chisq   <- chisq.test(rbind(as.numeric(x), as.numeric(y)))
  fisher  <- fisher.test(rbind(as.numeric(x), as.numeric(y)))
  results <- cbind(chisq  = chisq$statistic,
                   df     = chisq$parameter,
                   p      = chisq$p.value,
                   fisher = fisher$p.value)
  results
}

非常感谢帮助＆amp;指针文森特，非常感谢。

[1] http://gettinggeneticsdone.blogspot.co.uk/2011/02/summarize-missing-data-for-all.html

Answer 1

Vincent Zoonkeynd在上面的评论中提出的解决方案是修改quick.test（）函数并强制传递给矩阵的参数，所以现在有效的函数....

quick.test <- function(x, y){
  chisq   <- chisq.test(rbind(as.numeric(x), as.numeric(y)))
  fisher  <- fisher.test(rbind(as.numeric(x), as.numeric(y)))
  results <- cbind(chisq  = chisq$statistic,
                   df     = chisq$parameter,
                   p      = chisq$p.value,
                   fisher = fisher$p.value)
  results
}

R函数适用于某些数据帧而不适用于其他数据帧？

1 个答案: