基于带有tidyverse的分类列计算差异

时间:2018-08-17 07:35:12

标签: r dplyr tidyverse

我有以下数据框:

library(tidyverse)

df <- data.frame(
  vars = rep(letters[1:2], 3),
  value = c(10,12,15,19,22,23),
  phase = rep(factor(c("pre","post1","post2"), levels = c("pre","post1","post2")),2)
) %>% 
  arrange(vars,phase)

我想计算以下value中的差异

  • post1 - pre
  • post2 - post1
  • post2 - pre

对于每个var(即ab)。

使用tidyverse实现这一目标的最有效方法是什么?

预期结果:

 vars         x     diffs
    a   post1 - pre    12
    a post2 - post1    -7
    a   post2 - pre     5
    b   post1 - pre    -7
    b post2 - post1    11
    b   post2 - pre     4

2 个答案:

答案 0 :(得分:3)

您可以使用spread中的gathertidyr,首先将相位转换为列,然后在计算出差后再次将其转换为长格式:

library(dplyr)
library(tidyr)
df %>%
    spread(phase, value) %>%
    mutate("post1 - pre" = post1 - pre, "post2 - post1" = post2 - post1, "post2 - pre" = post2 - pre) %>%
    select(-pre, -post1, -post2) %>%
    gather("x", "diff", 2:4)

答案 1 :(得分:1)

这是一种更加自动化的方法,可以在指定差异必须遵循的顺序后获取所需的所有组合:

library(tidyverse)

# example dataset
df <- data.frame(
  vars = rep(letters[1:2], 3),
  value = c(10,12,15,19,22,23),
  phase = rep(factor(c("pre","post1","post2"), levels = c("pre","post1","post2")),2)
) %>% 
  arrange(vars,phase)

# set the levels in the right order based on the differences you want to get
df$phase = factor(df$phase, levels = c("post2","post1","pre"))


data.frame(t(combn(as.character(sort(unique(df$phase))), 2)), stringsAsFactors = F) %>%  # create a dataframe of unique combinations of differences you want to investigate
  mutate(vars = list(unique(df$vars))) %>%          # add unique vars as a list
  unnest() %>%                                      # get all combinations
  group_by(id = row_number()) %>%                   # for each row
  nest() %>%                                        # nest data
  mutate(diffs = map(data, ~df$value[df$vars==.$vars & df$phase==.$X1] - 
                            df$value[df$vars==.$vars & df$phase==.$X2]),   # get differences based on corresponding values
         x = map(data, ~paste0(c(.$X1, .$X2), collapse = " - "))) %>%      # create your x column
  unnest() %>%                                      # unnest data
  select(vars, x, diffs)                            # keep relevant columns

# # A tibble: 6 x 3
#   vars  x             diffs
#   <fct> <chr>         <dbl>
# 1 a     post2 - post1    -7
# 2 b     post2 - post1    11
# 3 a     post2 - pre       5
# 4 b     post2 - pre       4
# 5 a     post1 - pre      12
# 6 b     post1 - pre      -7