Joining the result of two statistical tables in one table in R

时间:2018-07-24 10:03:30

标签: r dplyr data.table lapply tidyr

In continuation of this issue comparison Mann-Whitney test between groups, I decided to create a new topic.

Solution of Rui Barradas helped me calculate Mann-Whitney for group 1-2 and 1-3.

lst <- split(mydat, mydat$group)
lapply(lst[-1], function(DF) wilcox.test(DF$var, lst[[1]]$var, exact = FALSE))

So now i want get the descriptive statistics. I use library:psych

describeBy(mydat$var,mydat$group)

So i get the following output

group: 1
   vars n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 4 23.5 0.58   23.5    23.5 0.74  23  24     1    0    -2.44 0.29
-------------------------------------------------------------------------------------- 
group: 2
   vars n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 4 23.5 0.58   23.5    23.5 0.74  23  24     1    0    -2.44 0.29
-------------------------------------------------------------------------------------- 
group: 3
   vars n mean   sd median trimmed  mad min max range skew kurtosis   se
X1    1 4 23.5 0.58   23.5    23.5 0.74  23  24     1    0    -2.44 0.29

It is inconvenient. I need only for each group mean,sd,median and p-value of wilcox.test.

I.E. i want these output

       mean    sd     median      p-value
group1  23,5    0,58    23,5    -
group2  23,5    0,58    23,5    1
group3  23,5    0,58    23,5    1

How can i performe it?

Edit

structure(list(`1` = structure(list(vars = 1, n = 4, mean = 23.5, 
    sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
    min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
    se = 0.288675134594813), .Names = c("vars", "n", "mean", 
"sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
"kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
"data.frame")), `2` = structure(list(vars = 1, n = 4, mean = 23.5, 
    sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
    min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
    se = 0.288675134594813), .Names = c("vars", "n", "mean", 
"sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
"kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
"data.frame")), `3` = structure(list(vars = 1, n = 4, mean = 23.5, 
    sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
    min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
    se = 0.288675134594813), .Names = c("vars", "n", "mean", 
"sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
"kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
"data.frame"))), .Dim = 3L, .Dimnames = structure(list(group = c("1", 
"2", "3")), .Names = "group"), call = by.default(data = x, INDICES = group, 
    FUN = describe, type = type), class = c("psych", "describeBy"
))

4 个答案:

答案 0 :(得分:2)

使用链接到问题的数据和上面的split指令,以下内容将产生所需的输出。

我重复测试以将其结果分配给wt_list

wt_list <- lapply(lst[-1], function(DF) wilcox.test(DF$var, lst[[1]]$var, exact = FALSE))

mu <- tapply(mydat$var, mydat$group, mean)
s  <- tapply(mydat$var, mydat$group, sd)
md <- tapply(mydat$var, mydat$group, median)

pval <- c(NA, sapply(wt_list, '[[', "p.value"))

df_smry <- data.frame(mean = mu, sd = s, median = md, p.value = pval)

df_smry
#  mean        sd median p.value
#1 23.5 0.5773503   23.5      NA
#2 23.5 0.5773503   23.5       1
#3 23.5 0.5773503   23.5       1

答案 1 :(得分:2)

您可以将tidyversebroom一起使用。 tidy()将测试结果作为data.frame给出。我们使用group添加缺失的complete值。然后,我们使用dplyr的{​​{1}}和group_by计算描述性统计量,并将结果合并到p.values中。如有必要,您可以最后过滤。

summarise_all

然后您可以过滤期望的输出

library(tidyverse)
mydat %>% 
  with(.,pairwise.wilcox.test(var, group, exact =F)) %>% 
  broom::tidy() %>% 
  complete(group1 = factor(mydat$group)) %>% 
  left_join(mydat %>% 
              group_by(group=as.character(group)) %>% 
              summarise_all(c("mean", "sd", "median")), 
            by=c("group1"="group"))
# A tibble: 4 x 6
  group1 group2 p.value  mean    sd median
  <chr>  <chr>    <dbl> <dbl> <dbl>  <dbl>
1 1      NA          NA  23.5 0.577   23.5
2 2      1            1  23.5 0.577   23.5
3 3      1            1  23.5 0.577   23.5
4 3      2            1  23.5 0.577   23.5

答案 2 :(得分:1)

这对您有用吗?如上所述,您的dput存在问题。

我必须为每个组使用unlist才能使用rbind,然后从dplyr中进行简单选择。

dat <- structure(list(`1` = structure(list(vars = 1, n = 4, mean = 23.5, 
                                           sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
                                           min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
                                           se = 0.288675134594813), .Names = c("vars", "n", "mean", 
                                                                               "sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
                                                                               "kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
                                                                                                                              "data.frame")), `2` = structure(list(vars = 1, n = 4, mean = 23.5, 
                                                                                                                                                                   sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
                                                                                                                                                                   min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
                                                                                                                                                                   se = 0.288675134594813), .Names = c("vars", "n", "mean", 
                                                                                                                                                                                                       "sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
                                                                                                                                                                                                       "kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
                                                                                                                                                                                                                                                      "data.frame")), `3` = structure(list(vars = 1, n = 4, mean = 23.5, 
                                                                                                                                                                                                                                                                                           sd = 0.577350269189626, median = 23.5, trimmed = 23.5, mad = 0.7413, 
                                                                                                                                                                                                                                                                                           min = 23, max = 24, range = 1, skew = 0, kurtosis = -2.4375, 
                                                                                                                                                                                                                                                                                           se = 0.288675134594813), .Names = c("vars", "n", "mean", 
                                                                                                                                                                                                                                                                                                                               "sd", "median", "trimmed", "mad", "min", "max", "range", "skew", 
                                                                                                                                                                                                                                                                                                                               "kurtosis", "se"), row.names = "X1", class = c("psych", "describe", 
                                                                                                                                                                                                                                                                                                                                                                              "data.frame"))), .Dim = 3L, .Dimnames = structure(list(group = c("1", 
                                                                                                                                                                                                                                                                                                                                                                                                                                               "2", "3")), .Names = "group"), class = c("psych", "describeBy"
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ))

require(tidyverse)

rbind(unlist(dat[[1]]),unlist(dat[[2]]),unlist(dat[[3]])) %>% 
  as.data.frame() %>% 
  select(mean, sd, median)

答案 3 :(得分:1)

另一种实现此目的的方法是在mat上添加参数describeBy

    describeBy(mydat$var, mydat$group, mat = TRUE)

   # So first I've used the data and the code form the link: 
    lst <- split(mydat, mydat$group)
    .ls <- lapply(lst[-1], function(DF) wilcox.test(DF$var, lst[[1]]$var, exact = FALSE))

    # Then I extracted values of p.values
    .ls <- c("-", sapply(.ls, '[[', "p.value"))

    # And finally I combined desired columns with extracted p.values
    cbind(describeBy(mydat$var, mydat$group, mat = TRUE)[c(5, 6, 7)], "p.value" =.ls)

   # And the output:

           mean        sd median p.value
        11 23.5 0.5773503   23.5       -
        12 23.5 0.5773503   23.5       1
        13 23.5 0.5773503   23.5       1