从宽格式到长格式时保留列的顺序

时间:2017-10-27 18:15:41

标签: r tidyr

当我从宽格式到长格式收集列时,我正在尝试保留列的顺序。我遇到的问题是在gathersummarize订单丢失之后。列数很大,所以我不想手动输入订单。

以下是一个例子:

library(tidyr)
library(dplyr)

N <- 4
df <- data.frame(sample = c(1,1,2,2),
                 y1.1 = rnorm(N), y2.1 = rnorm(N), y10.1 = rnorm(N))
> df
  sample      y1.1      y2.1      y10.1
1      1  1.040938 0.8851727 -0.3617224
2      1  1.175879 1.0009824 -1.1352406
3      2 -1.501832 0.3446469 -1.8687008
4      2 -1.326817 0.4434628 -0.8795962

我想要的是保留列的顺序。在我做了一些操作之后,订单就丢失了。见到这里:

dfg <- df %>% 
  gather(key="key", value="value", -sample) %>%
  group_by(sample, key) %>%
  summarize(mean = mean(value))

> filter(dfg, sample == 1)
  sample   key       mean
   <dbl> <chr>      <dbl>
1      1  y1.1  0.2936335
2      1 y10.1  0.6170505
3      1  y2.1 -0.2250543

您可以看到y10.1如何将y2.1置于我不想要的dfg <- df %>% gather(key="key", value="value", -sample) > filter(dfg, sample == 1) sample key value 1 1 y1.1 0.60171521 2 1 y1.1 -0.01444823 3 1 y2.1 0.81566726 4 1 y2.1 -1.26577581 5 1 y10.1 0.41686388 6 1 y10.1 0.81723707 之前。我想要的是保留这个顺序,如下所示:

group_by

出于某种原因,summarizeungroup操作会更改订单。我不知道为什么。我尝试了keyPath命令,但是没有做任何事情。正如我之前所说,我的实际数据框有很多列,我需要保留顺序。保留顺序的原因是我可以按正确的顺序绘制数据。

有什么想法吗?

5 个答案:

答案 0 :(得分:2)

或者您可以将键列转换为具有反映原始列名称顺序的级别的因子:

df %>% 
    gather(key="key", value="value", -sample) %>%
    mutate(key=factor(key, levels=names(df)[-1])) %>% # add this line to convert the key to a factor
    group_by(sample, key) %>%
    summarize(mean = mean(value)) %>%
    filter(sample == 1)

# A tibble: 3 x 3
# Groups:   sample [1]
#  sample    key       mean
#   <dbl> <fctr>      <dbl>
#1      1   y1.1  0.8310786
#2      1   y2.1 -1.2596933
#3      1  y10.1  0.8208812

答案 1 :(得分:1)

我通过使用查找表找到了可行的解决方案。它似乎对我有用,因为我可以提取列名并为列名指定一个有序数字,然后与我的data.frame配对。

以下是解决方案:

lookup <- tibble(key = c("y1.1", "y2.1", "y10.1"),
                 index = c(1,2,3))

> left_join(dfg, lookup, by="key")
# A tibble: 6 x 4
  sample   key       mean index
   <dbl> <chr>      <dbl> <dbl>
1      1  y1.1  0.2936335     1
2      1 y10.1  0.6170505     3
3      1  y2.1 -0.2250543     2
4      2  y1.1  1.3652070     1
5      2 y10.1  0.9889233     3
6      2  y2.1  0.5216553     2

答案 2 :(得分:1)

如果您的列确实按其包含的数字排序,则应该可以使用:

library(readr)

df %>% 
  gather(key="key", value="value", -sample) %>%
  group_by(sample, key)         %>%
  summarize(mean = mean(value)) %>%
  arrange(parse_number(key))    %>%  # <- sorting by number contained in key
  filter(sample == 1)

# # A tibble: 3 x 3
# # Groups:   sample [1]
#     sample   key       mean
# <dbl> <chr>      <dbl>
#   1      1  y1.1 -0.9236688
#   2      1  y2.1 -0.2168337
#   3      1 y10.1  0.5041981

答案 3 :(得分:1)

tidyverse软件包现在允许优雅的解决方案:

    library(tidyverse)
    N <- 4
    df <- data.frame(sample = c(1,1,2,2),
                    y1.1 = rnorm(N), y2.1 = rnorm(N), y10.1 = rnorm(N))
    df %>% 
        gather("key", "value", -sample, factor_key = T) %>% 
        group_by(sample, key) %>%
        summarise(mean = mean(value))

这将导致

    # A tibble: 6 x 3
    # Groups:   sample [2]
    sample key      mean
    <dbl> <fct>   <dbl>
    1      1 y1.1   0.0894
    2      1 y2.1   0.551 
    3      1 y10.1  0.254 
    4      2 y1.1  -0.555 
    5      2 y2.1  -1.36  
    6      2 y10.1 -0.794 

答案 4 :(得分:0)

另一种方法是使用您要排序的键列的自定义版本arrange数据框:

library(dplyr)
library(tidyr)

df %>% 
  gather(key="key", value="value", -sample) %>%
  group_by(sample, key) %>%
  summarize(mean = mean(value)) %>%
  arrange(as.numeric(stringr::str_replace(key, "y", "")), .by_group = TRUE)

#> # A tibble: 6 x 3
#> # Groups:   sample [2]
#>   sample   key        mean
#>    <dbl> <chr>       <dbl>
#> 1      1  y1.1  0.07001689
#> 2      1  y2.1  1.15349430
#> 3      1 y10.1  1.18266024
#> 4      2  y1.1  0.42616604
#> 5      2  y2.1  1.05891682
#> 6      2 y10.1 -0.12561209