我有一个与此相似的R数据帧:
sample.data <- data.frame(Sample = c(1,1,1,2,2,2,3,3,3,4,4),
Count = c(1,76,73,26,89,29,3,34,45,94,50),
Condition = c("A","B","B","A","D","A","B","B","A","A","A"))
sample.data
Sample Count Condition
1 1 1 A
2 1 76 B
3 1 73 B
4 2 26 A
5 2 89 D
6 2 29 A
7 3 3 B
8 3 34 B
9 3 45 A
10 4 94 A
11 4 50 A
我想 1)如果“样本”和“条件”列相同,则对“计数”列求和。所以看起来像这样:
Sample Count Condition
1 1 1 A
2 1 149 B
3 2 55 A
4 2 89 D
5 3 37 B
6 3 189 A
7 4 144 A
2)然后将其转换为宽表,例如:
Condition 1 2 3 4
A 1 55 189 144
B 149 37 0 0
D0 89 0 0
3)最后平均
我是否可以创建另一个数据框,在其中我具有相同的“条件”列,然后有两个平均值为(1-2)和(3-4)的列?
喜欢
Sample
Condition AV12 AV34
1 A 28 94.5
2 B 74.5 18.5
3 D 44.5 0
答案 0 :(得分:2)
我们按“样本”,“条件”分组,得到“计数”的sum
,然后spread
变成“宽”格式
library(tidyverse)
sample.data %>%
group_by(Sample, Condition) %>%
summarise(Count = sum(Count)) %>%
spread(Sample, Count, fill = 0)
# A tibble: 3 x 5
# Condition `1` `2` `3` `4`
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 A 1 55 45 144
#2 B 149 0 37 0
#3 D 0 89 0 0
或使用xtabs
中的base R
out <- xtabs(Count ~ Condition + Sample, sample.data)
# Sample
#Condition 1 2 3 4
# A 1 55 45 144
# B 149 0 37 0
3 D 0 89 0 0
如果我们需要获取两列的按行平均值
out1 <- cbind(rowMeans(out[, 1:2]), rowMeans(out[, 3:4]))
colnames(out1) <- paste0("AV", c(12, 34))
或带有tapply
tapply(sample.data$Count, sample.data[c(3, 1)], sum)
答案 1 :(得分:0)
为了将来使用,如果有多个变量“ Condition”,则代码如下:
sample.data <- data.frame(Sample = c(1,1,1,2,2,2,3,3,3,4,4),
Count = c(1,76,73,26,89,29,3,34,45,94,50),
Condition1 = c("A","B","B","A","D","A","B","B","A","A","A"),
Condition2 = c("A1", "B1", "B2", "A1", "D1", "A2", "B1", "B2", "A1", "A2", "A3"))
library(tidyverse)
data <- sample.data %>%
group_by(Sample, Condition1, Condition2) %>%
summarise(Count = sum(Count)) %>%
spread(Sample, Count, fill = 0)