我试图计算因子水平的频率,但使用两个不同矩阵中的可用信息。
第一个(我在下面称为df1)是在不同地点发生的某些物种(编码为" sp")的丰度。但是,我想量化每个站点内物种特征的常见程度。例如,如果sp 1和2分别代表在位点1中发现的所有14个个体的4和10,那么我可以使用第二个数据框(df2)来量化每个物种具有的相应性状。此merged_df的预期结果是四列数据框,其具有每个站点上的特征级别的站点,特征(因子),特征级别和频率。请参阅附图,以便更清楚。我尝试过一些函数,比如cast :: reshape和gather :: tidyr,但是我无法弄明白。
提前谢谢大家。
df1 <- data.frame(sp1 = c(4, 10, 0),
sp2 = c(0, 4, 5),
sp3 = c(0, 0, 3))
rownames(df1) <- paste("site", 1:3, sep="")
str(df1)
'data.frame': 3 obs. of 3 variables:
$ sp1: num 4 10 0
$ sp2: num 0 4 5
$ sp3: num 0 0 3
df2 <- data.frame(t1 = c("a", "b", "c"),
t2 = c("z", "x", "y"),
t3 = c("m", "n", "o"))
rownames(df2) <- paste("sp", 1:3, sep="")
str(df2)
'data.frame': 3 obs. of 3 variables:
$ t1: Factor w/ 3 levels "a","b","c": 1 2 3
$ t2: Factor w/ 3 levels "x","y","z": 3 1 2
$ t3: Factor w/ 3 levels "m","n","o": 1 2 3
答案 0 :(得分:0)
更正您的df1以适合您的架构:
df1 <- data.frame(sp1 = c(4, 0, 0),
sp2 = c(10, 4, 0),
sp3 = c(0, 5, 3))
rownames(df1) <- paste("site", 1:3, sep="")
df2 <- data.frame(t1 = c("a", "b", "c"),
t2 = c("z", "x", "y"),
t3 = c("m", "n", "o"))
rownames(df2) <- paste("sp", 1:3, sep="")
library(tidyr)
library(dplyr)
df2 %>%
mutate(sp = rownames(.)) %>%
gather(factor,level,-sp) %>%
left_join(df1 %>% mutate(site = rownames(.)) %>%gather(sp,val,-site)) %>%
group_by(factor,site) %>%
mutate(frequency = val/sum(val)) %>%
ungroup %>%
arrange(site,factor,level) %>%
select(site,factor,level,frequency)
# # A tibble: 27 x 4
# site factor level frequency
# <chr> <chr> <chr> <dbl>
# 1 site1 t1 a 0.2857143
# 2 site1 t1 b 0.7142857
# 3 site1 t1 c 0.0000000
# 4 site1 t2 x 0.7142857
# 5 site1 t2 y 0.0000000
# 6 site1 t2 z 0.2857143
# 7 site1 t3 m 0.2857143
# 8 site1 t3 n 0.7142857
# 9 site1 t3 o 0.0000000
# 10 site2 t1 a 0.0000000
# # ... with 17 more row
请注意,不允许使用重复的rownames
,因此我必须将网站设为一列
订单略有不同,请告诉我是否需要修复。