如何使用比例数据框分割行数

时间:2017-04-04 15:14:35

标签: r dplyr tidyr

在一大堆物种数据中,我遗憾地记录了两个相似的物种,并将它们计算在一起(我计算的是Sp2而不是Sp2a和Sp2b)。我重新检查了所有样本,并测量了每个物种应该的联合计数的比例(例如,对于样本"北和#34;,Sp2计数40次,并且我确定该计数的20%应该是Sp2a和80 %应为Sp2b。

有人知道如何在图表数据框中应用比例数据

samples <- c("north", "west", "south")
sp2a_props <- c(.2, .3, .4)
sp2b_props <- c(.8, .7, .6)
chart <- data.frame(samples, sp2a_props, sp2b_props, stringsAsFactors = FALSE)
chart

到原始数据框中的相关行

samples <- c("north","north", "west","west","south", "south")
species <- c("Sp1", "Sp2", "Sp1", "Sp4", "Sp2", "Sp3")
counts <- c(20, 40, 30, 50, 30, 30)
raw <- data.frame(samples, species, counts, stringsAsFactors = FALSE)
raw

获取所需的新数据框

samples <- c("north","north","north", "west","west","south", "south", "south")
species <- c("Sp1", "Sp2a", "Sp2b", "Sp1", "Sp4", "Sp2a", "Sp2b", "Sp3")
counts <- c(20, 8,32, 30, 50, 12, 18, 30)
desired_result <- data.frame(samples, species, counts)
desired_result

虽然虚拟数据只将Sp2分成2部分,但我也可能需要将某些集总分类分成3部分。

1 个答案:

答案 0 :(得分:0)

使用dplyrtidyr,您只需要进行一些操作并加入即可获得所需内容。

首先,将图表从宽到长重塑,并删除&#39; _props&#39;从物种名称准备下游加入。

其次,操纵raw数据框以包含a / b拆分(使用dplyr::case_when来解决多个拆分)。将它们分成行,将它们与物种联合以获得sp2a/sp2b,将其与图表值连接以获得比例,如果存在则乘以比例计算并删除比例列。

library(dplyr)
library(tidyr)

chart <- chart  %>%
  gather(species, proportion, -samples) %>% 
  mutate(species = gsub("_props", "", species))

raw %>% 
  mutate(species = tolower(species)) %>% 
  mutate(split = ifelse(species == "sp2", "a,b", "")) %>% 
  separate_rows(split, sep = ",") %>% 
  unite(species, species, split, sep = "") %>% 
  left_join(chart) %>% 
  mutate(counts = ifelse(!is.na(proportion), counts * proportion, counts)) %>% 
  select(-proportion)

结果:

  samples species counts
1   north     sp1     20
2   north    sp2a      8
3   north    sp2b     32
4    west     sp1     30
5    west     sp4     50
6   south    sp2a     12
7   south    sp2b     18
8   south     sp3     30

(如果您希望物种回到标题案例,我会使用tools::toTitleCase