总和列,如果其他两列相等,则转换为宽计数表

时间:2019-07-24 15:01:12

标签: r

我有一个与此相似的R数据帧:

sample.data <- data.frame(Sample = c(1,1,1,2,2,2,3,3,3,4,4),
                          Count = c(1,76,73,26,89,29,3,34,45,94,50),
                          Condition = c("A","B","B","A","D","A","B","B","A","A","A"))
sample.data

   Sample Count Condition
1     1    1 A
2     1    76 B
3     1    73 B
4     2    26 A
5     2    89 D
6     2    29 A
7     3    3 B
8     3    34 B
9     3    45 A
10    4    94 A
11    4    50 A

我想 1)如果“样本”和“条件”列相同,则对“计数”列求和。所以看起来像这样:

   Sample Count Condition
1     1    1 A
2     1    149 B
3     2    55 A
4     2    89 D
5     3    37 B
6     3    189 A
7    4    144 A

2)然后将其转换为宽表,例如:

Condition 1 2 3 4 
A 1 55 189 144
B 149 37 0 0
D0 89 0 0 

3)最后平均

我是否可以创建另一个数据框,在其中我具有相同的“条件”列,然后有两个平均值为(1-2)和(3-4)的列?

喜欢

  Sample
Condition   AV12   AV34   
1 A   28 94.5
2 B 74.5  18.5
3 D   44.5   0

2 个答案:

答案 0 :(得分:2)

我们按“样本”,“条件”分组,得到“计数”的sum,然后spread变成“宽”格式

library(tidyverse)
sample.data %>%
    group_by(Sample, Condition) %>% 
    summarise(Count = sum(Count)) %>% 
    spread(Sample, Count, fill = 0)
# A tibble: 3 x 5
#  Condition   `1`   `2`   `3`   `4`
#  <fct>     <dbl> <dbl> <dbl> <dbl>
#1 A             1    55    45   144
#2 B           149     0    37     0
#3 D             0    89     0     0

或使用xtabs中的base R

out <- xtabs(Count ~ Condition + Sample, sample.data)
#       Sample
#Condition   1   2   3   4
#        A   1  55  45 144
#        B 149   0  37   0
3        D   0  89   0   0

如果我们需要获取两列的按行平均值

out1 <-  cbind(rowMeans(out[, 1:2]), rowMeans(out[, 3:4]))
colnames(out1) <- paste0("AV", c(12, 34))

或带有tapply

tapply(sample.data$Count, sample.data[c(3, 1)], sum)

答案 1 :(得分:0)

为了将来使用,如果有多个变量“ Condition”,则代码如下:

sample.data <- data.frame(Sample = c(1,1,1,2,2,2,3,3,3,4,4),
                              Count = c(1,76,73,26,89,29,3,34,45,94,50),
                              Condition1 = c("A","B","B","A","D","A","B","B","A","A","A"), 
                              Condition2 = c("A1", "B1", "B2", "A1", "D1", "A2", "B1", "B2", "A1", "A2", "A3"))

    library(tidyverse)
    data <- sample.data %>%
        group_by(Sample, Condition1, Condition2) %>% 
        summarise(Count = sum(Count)) %>% 
        spread(Sample, Count, fill = 0)