使用具有多个变量的聚集/传播用dplyr重塑df

时间:2019-03-14 05:58:24

标签: r reshape

我正在尝试使用dplyr收集和散布函数重塑数据集以从此数据形状中移出

library(tidyverse)
# test data set
df = tibble(factor = c("a","a","b","b"),
           factor2 = c("d1","d2","d1","d2"),
           value1 = round(rnorm(4),1)*10,
           value2 = round(runif(4),2)*100)

看起来像这样:

# A tibble: 4 x 4
  factor factor2 value1 value2
  <chr>  <chr>    <dbl>  <dbl>
1 a      d1           4     97
2 a      d2         -21     10
4 b      d1          -2     65
5 b      d2         -14     93

看起来像这样:

factor    d1val1   d1val2  d2val1  d2val2
a          4        97      -21     10
b         -2        65      -14     93

理想情况下,我想通过dplyr传播/收集实现这一目标。

3 个答案:

答案 0 :(得分:3)

为完整起见,dcast()的{​​{3}}实现能够同时重塑多个变量:

library(data.table)
dcast(setDT(df), factor ~ factor2, value.var = c("value1", "value2"))
   factor value1_d1 value1_d2 value2_d1 value2_d2
1:      a         4       -21        97        10
2:      b        -2       -14        65        93

答案 1 :(得分:2)

一种选择是将“值”列gather设置为“长”格式,然后unite将“ factor2”和“键”列创建为单个列,然后{{1} }恢复为“宽”格式

spread

因为列类型相同,所以可以使用library(dplyr) library(tidyr) df %>% gather(key, val, value1:value2) %>% unite(dcols, factor2, key, sep = "") %>% spread(dcols, val) gather

答案 2 :(得分:2)

另一种tidyverse可能是:

df %>% 
 gather(var, val, -c(factor, factor2)) %>%
 mutate(var = paste0(factor2, var)) %>%
 select(-factor2) %>%
 spread(var, val) 

  factor d1value1 d1value2 d2value1 d2value2
  <chr>     <dbl>    <dbl>    <dbl>    <dbl>
1 a            -4       85       -4       65
2 b             4       39       -1       20

它首先将数据从宽格式转换为长格式,但不包括变量“ factor”和“ factor2”。其次,它将“ factor2”中的值和变量名称组合在一起。最后,它将删除冗余变量,并将数据返回所需的格式。