在一大堆物种数据中,我遗憾地记录了两个相似的物种,并将它们计算在一起(我计算的是Sp2而不是Sp2a和Sp2b)。我重新检查了所有样本,并测量了每个物种应该的联合计数的比例(例如,对于样本"北和#34;,Sp2计数40次,并且我确定该计数的20%应该是Sp2a和80 %应为Sp2b。
有人知道如何在图表数据框中应用比例数据
samples <- c("north", "west", "south")
sp2a_props <- c(.2, .3, .4)
sp2b_props <- c(.8, .7, .6)
chart <- data.frame(samples, sp2a_props, sp2b_props, stringsAsFactors = FALSE)
chart
到原始数据框中的相关行
samples <- c("north","north", "west","west","south", "south")
species <- c("Sp1", "Sp2", "Sp1", "Sp4", "Sp2", "Sp3")
counts <- c(20, 40, 30, 50, 30, 30)
raw <- data.frame(samples, species, counts, stringsAsFactors = FALSE)
raw
获取所需的新数据框
samples <- c("north","north","north", "west","west","south", "south", "south")
species <- c("Sp1", "Sp2a", "Sp2b", "Sp1", "Sp4", "Sp2a", "Sp2b", "Sp3")
counts <- c(20, 8,32, 30, 50, 12, 18, 30)
desired_result <- data.frame(samples, species, counts)
desired_result
虽然虚拟数据只将Sp2分成2部分,但我也可能需要将某些集总分类分成3部分。
答案 0 :(得分:0)
使用dplyr
和tidyr
,您只需要进行一些操作并加入即可获得所需内容。
首先,将图表从宽到长重塑,并删除&#39; _props&#39;从物种名称准备下游加入。
其次,操纵raw
数据框以包含a / b拆分(使用dplyr::case_when
来解决多个拆分)。将它们分成行,将它们与物种联合以获得sp2a/sp2b
,将其与图表值连接以获得比例,如果存在则乘以比例计算并删除比例列。
library(dplyr)
library(tidyr)
chart <- chart %>%
gather(species, proportion, -samples) %>%
mutate(species = gsub("_props", "", species))
raw %>%
mutate(species = tolower(species)) %>%
mutate(split = ifelse(species == "sp2", "a,b", "")) %>%
separate_rows(split, sep = ",") %>%
unite(species, species, split, sep = "") %>%
left_join(chart) %>%
mutate(counts = ifelse(!is.na(proportion), counts * proportion, counts)) %>%
select(-proportion)
结果:
samples species counts
1 north sp1 20
2 north sp2a 8
3 north sp2b 32
4 west sp1 30
5 west sp4 50
6 south sp2a 12
7 south sp2b 18
8 south sp3 30
(如果您希望物种回到标题案例,我会使用tools::toTitleCase
)