在tidyr :: spread中对具有重复标识符的行求和

时间:2016-10-03 16:30:17

标签: r tidyr spread

我正在处理一些奇怪格式的调查数据(由其他人收集和记录)。它记录了调查样带的物种丰度,但它只列出了在给定样带中观察到的物种,而不是所有可能记录的物种。我花了一些时间弄清楚如何使用tidyr重新塑造数据,以便在每次调查期间我为每个物种都有列,而未记录的物种则填充0。这是一个简短,可重复的例子:

#This works:
Survey <- as.factor(c(rep("Survey 1",10),rep("Survey 2",10),rep("Survey 3",10)))
Species <- as.factor(c(c("A","B","C","D","E","U","V","W","X","Y"),c("A","C","E","G","I","K","M","O","Q","S"),c("B","D","F","H","J","L","N","P","R","T")))
Abundance <- ceiling(runif(30,1,50))

working.df<-cbind.data.frame(Survey,Species,Abundance)

working.spread<-working.df %>%
  group_by(Survey) %>%
  spread(Species,Abundance,drop=F,fill=0)

不幸的是,真实数据并非如此简单。在某些情况下,他们在一次调查中记录了同一物种的多个行,这样他们就可以记录我不感兴趣的其他变量的信息。我只关心每次调查的总丰度。所以这是真实数据可能是什么样子的一个例子(注意在物种2开始时的双重&#34; A&#34;)

#This doesn't work:    
Species2 <- as.factor(c(c("A","A","C","D","E","U","V","W","X","Y"),c("A","C","E","G","I","K","M","O","Q","S"),c("B","D","F","H","J","L","N","P","R","T")))

not.working.df<-cbind.data.frame(Survey,Species2,Abundance)

not.working.spread<-not.working.df %>%
  group_by(Survey) %>%
  spread(Species2,Abundance,drop=F,fill=0) 

因此,当列出两个相同的物种时,传播参数不再有效,并返回熟悉的错误:

Error: Duplicate identifiers for rows (1, 2)

在真正的数据集中,我得到了一个错误,其中有很多重复项(这只是几个数据集中的一个),所以我不想手动修复它,当然:< / p>

Error: Duplicate identifiers for rows (206, 216), (1532, 1544), (1052, 1595), (1324, 1330), (191, 212), (194, 211), (1392, 1600), (19, 37), (1404, 1599), (199, 215), (1073, 1596), (1074, 1597), (43, 44, 45), (455, 456), (380, 381, 382, 383), (447, 448), (413, 414, 415, 416, 417, 418), (303, 304), (1015, 1016), (897, 898, 1593), (1306, 1307), (1041, 1594), (1076, 1598), (1425, 1426), (49, 64), (198, 214) 

我想要做的是在重复标识符之间加上Abundance字段。我知道这里有类似的问题,而且我对很多问题感到厌烦,但我还没有找到解决方法。我已经努力通过传播来达到这一点,似乎我只是一个简单的功能命令,而不是让它工作......任何建议都会非常感激。或者如果我完全错过了这个问题的现有答案,请指出我的方向。

干杯

1 个答案:

答案 0 :(得分:1)

谢谢,aosmith,指出我在总结线程的方向 - 做了诀窍。这是工作解决方案:

not.working.spread<-not.working.df %>%
  group_by(Survey,Species2) %>%
  summarize(Abundance = sum(Abundance)) %>%
  spread(Species2,Abundance,drop=F,fill=0)