从第二个data.frame中的属性计算因子级别的频率

时间:2017-09-04 18:28:32

标签: r plyr reshape tidyr reshape2

我试图计算因子水平的频率,但使用两个不同矩阵中的可用信息。

第一个(我在下面称为df1)是在不同地点发生的某些物种(编码为" sp")的丰度。但是,我想量化每个站点内物种特征的常见程度。例如,如果sp 1和2分别代表在位点1中发现的所有14个个体的4和10,那么我可以使用第二个数据框(df2)来量化每个物种具有的相应性状。此merged_df的预期结果是四列数据框,其具有每个站点上的特征级别的站点,特征(因子),特征级别和频率。请参阅附图,以便更清楚。我尝试过一些函数,比如cast :: reshape和gather :: tidyr,但是我无法弄明白。

提前谢谢大家。

df1 <- data.frame(sp1 = c(4, 10, 0),
              sp2 = c(0, 4, 5),
              sp3 = c(0, 0, 3))
rownames(df1) <- paste("site", 1:3, sep="")
str(df1)
'data.frame':   3 obs. of  3 variables:
$ sp1: num  4 10 0
$ sp2: num  0 4 5
$ sp3: num  0 0 3

df2 <- data.frame(t1 = c("a", "b", "c"),
              t2 = c("z", "x", "y"),
              t3 = c("m", "n", "o"))
rownames(df2) <- paste("sp", 1:3, sep="")
str(df2)
'data.frame':   3 obs. of  3 variables:
$ t1: Factor w/ 3 levels "a","b","c": 1 2 3
$ t2: Factor w/ 3 levels "x","y","z": 3 1 2
$ t3: Factor w/ 3 levels "m","n","o": 1 2 3

Please, click here to see a schematic description

1 个答案:

答案 0 :(得分:0)

更正您的df1以适合您的架构:

df1 <- data.frame(sp1 = c(4, 0, 0),
                  sp2 = c(10, 4, 0),
                  sp3 = c(0, 5, 3))
rownames(df1) <- paste("site", 1:3, sep="")

df2 <- data.frame(t1 = c("a", "b", "c"),
                  t2 = c("z", "x", "y"),
                  t3 = c("m", "n", "o"))
rownames(df2) <- paste("sp", 1:3, sep="")
library(tidyr)
library(dplyr)
df2 %>%
  mutate(sp = rownames(.)) %>%
  gather(factor,level,-sp) %>%
  left_join(df1 %>% mutate(site = rownames(.)) %>%gather(sp,val,-site)) %>%
  group_by(factor,site) %>%
  mutate(frequency = val/sum(val)) %>%
  ungroup %>%
  arrange(site,factor,level) %>%
  select(site,factor,level,frequency) 
# # A tibble: 27 x 4
#     site factor level frequency
#    <chr>  <chr> <chr>     <dbl>
#  1 site1     t1     a 0.2857143
#  2 site1     t1     b 0.7142857
#  3 site1     t1     c 0.0000000
#  4 site1     t2     x 0.7142857
#  5 site1     t2     y 0.0000000
#  6 site1     t2     z 0.2857143
#  7 site1     t3     m 0.2857143
#  8 site1     t3     n 0.7142857
#  9 site1     t3     o 0.0000000
# 10 site2     t1     a 0.0000000
# # ... with 17 more row

请注意,不允许使用重复的rownames,因此我必须将网站设为一列 订单略有不同,请告诉我是否需要修复。