这是一些虚拟数据:
class<-c("ab","ab","ad","ab","ab","ad","ab","ab","ad","ab","ad","ab","av")
otu<-c("ab","ac","ad","ab","ac","ad","ab","ac","ad","ab","ad","ac","av")
value<-c(0,1,12,13,300,1,2,3,4,0,0,2,4)
type<-c("b","c","d","a","b","c","d","d","d","c","b","a","a")
location<-c("b","c","d","a","b","d","d","d","d","c","b","a","a")
datafr1<-data.frame(class,otu,value,type,location)
如果组'location'和'type'中的任何复制为0,我想摆脱任何OTU,因为我对组内所有重复项之间共享的OTU感兴趣。
我想计算两件事。 一:组“位置”和类型'(丰度)之间共享的所有OTU的“价值”丰度百分比 二:计算每个类中共享的OTU数量(otu.freq)
需要注意的是,我希望OTU按“类”分类,而不是OTU名称(因为它没有意义)。
预期产出:
class location type abundance otu.freq
ab a a 79 2
av a a 21 1
ab b b 100 1
ab c c 100 1
ad d c 100 1
ab d d 24 2
ad d d 76 2
我有一个更大的数据框,并尝试使用dplyr here,但我的RAM用完了,所以我不知道它是否有效。
下面@Akron提供的解决方案不计算丰度为0的情况,但它不会从该组中的其他重复项中删除该OTU。如果任何OTU的丰度为0,那么它不会在该组之间共享,我需要从丰度和otu.freq计算中完全折扣它。
library(dplyr)
so_many_shared3<-datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=sum(value !=0))
class location type abundance otu.freq
1 ab a a 4.3859649 2
2 ab b b 87.7192982 1
3 ab c c 0.2923977 1
4 ab d d 1.4619883 2
5 ad b b 0.0000000 0
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a a 1.1695906 1
答案 0 :(得分:1)
您的聚合函数中存在错误。如果你想计算otu的频率,你应该把otu放在&#34;〜&#34;标志。之后,您可以使用join
库中的plyr
函数合并它们
abund_shared_freq<-aggregate(otu~class+location+type,datafr1,length)
library(plyr)
join(abund_shared, abund_shared_freq, by=c("class", "location","type"), type="left")
输出:
class location type abundance otu
1 ab a a 4.3859649 2
2 ab b b 87.7192982 2
3 ab c c 0.2923977 2
4 ab d d 1.4619883 2
5 ad b b 0.0000000 1
6 ad d c 0.2923977 1
7 ad d d 4.6783626 2
8 av a a 1.1695906 1
答案 1 :(得分:1)
您可以使用data.table
library(data.table)
val = sum(datafr1$value)
setDT(datafr1)[order(class,type), list(abundance =
sum(value)/val*100, otu.freq = .N),
by = .(class, location, type)]
或使用dplyr
library(dplyr)
datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100, otu.freq=n())
# class location type abundance otu.freq
#1 ab a a 4.3859649 2
#2 ab b b 87.7192982 2
#3 ab c c 0.2923977 2
#4 ab d d 1.4619883 2
#5 ad b b 0.0000000 1
#6 ad d c 0.2923977 1
#7 ad d d 4.6783626 2
#8 av a a 1.1695906 1
根据新标准,我正在更新OP建议的代码(@ K.Brannen)
datafr1 %>%
group_by(class, location, type) %>%
summarise(abundance=sum(value)/sum(datafr1[['value']])*100,
otu.freq=sum(value !=0))
基于更新的预期结果
datafr1 %>%
filter(value!=0) %>%
group_by(location, type) %>%
mutate(value1=sum(value)) %>%
group_by(class, add=TRUE) %>%
summarise(abundance=round(100*sum(value)/unique(value1)),
otu.freq=n())
# location type class abundance otu.freq
#1 a a ab 79 2
#2 a a av 21 1
#3 b b ab 100 1
#4 c c ab 100 1
#5 d c ad 100 1
#6 d d ab 24 2
#7 d d ad 76 2