使用存储在行R中的密钥将数据帧从宽格式转换为长格式

时间:2019-05-08 20:11:42

标签: r tidyr

我正在使用tidyverse,但也欢迎使用base解决方案。

是否有一种方法可以不转置gather数据框,但是key可以存储在一行中,而不是key作为列名。例如,假设我有一个叫df的小标题。

df <- tibble(a = c(5,3,5,6,2,"G1"),
             b = c(5,3,5,6,2,"G1"),
             c = c(8,2,6,4,1,"G2"),
             d = c(8,2,6,4,1,"G2"),
             e = c(9,3,7,8,4,"G3"),
             f = c(9,3,7,8,4,"G3"),
             g = c(6,5,2,1,8,"G4"),
             h = c(6,5,2,1,8,"G4"))
df
# A tibble: 6 x 8
  a     b     c     d     e     f     g     h    
  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 5     5     8     8     9     9     6     6    
2 3     3     2     2     3     3     5     5    
3 5     5     6     6     7     7     2     2    
4 6     6     4     4     8     8     1     1    
5 2     2     1     1     4     4     8     8    
6 G1    G1    G2    G2    G3    G3    G4    G4 

要分组或聚集的组在底行中。有没有一种方法可以使df仅具有三列,从而将c,e和g列收集到a列中,将d,f和h列收集到b列中,而第6行成为column C?结果如下:

tibble(a = c(5,3,5,6,2,8,2,6,4,1,9,3,7,8,4,6,5,2,1,8),
       b = c(5,3,5,6,2,8,2,6,4,1,9,3,7,8,4,6,5,2,1,8),
       c = c("G1","G1","G1","G1","G1","G2","G2","G2","G2","G2",
             "G3","G3","G3","G3","G3","G4","G4","G4","G4","G4"))
# A tibble: 20 x 3
       a     b c    
   <dbl> <dbl> <chr>
 1     5     5 G1   
 2     3     3 G1   
 3     5     5 G1   
 4     6     6 G1   
 5     2     2 G1   
 6     8     8 G2   
 7     2     2 G2   
 8     6     6 G2   
 9     4     4 G2   
10     1     1 G2   
11     9     9 G3   
12     3     3 G3   
13     7     7 G3   
14     8     8 G3   
15     4     4 G3   
16     6     6 G4   
17     5     5 G4   
18     2     2 G4   
19     1     1 G4   
20     8     8 G4 

我想避免转置,因为我需要保留行和列的顺序,直到正确标记所有内容为止。

4 个答案:

答案 0 :(得分:3)

这是一个主意。

library(tidyverse)

df2 <- df %>%
  t() %>%
  as.data.frame(stringsAsFactors = FALSE) %>%
  split(f = .$V6) %>%
  map_dfr(~.x %>% 
            select(-V6) %>%
            t() %>%
            as.data.frame(stringsAsFactors = FALSE) %>%
            setNames(c("a", "b")),
          .id = "c") %>%
  select(a, b, c) %>%
  mutate_at(vars(-c), list(~as.numeric(.)))

df2
#    a b  c
# 1  5 5 G1
# 2  3 3 G1
# 3  5 5 G1
# 4  6 6 G1
# 5  2 2 G1
# 6  8 8 G2
# 7  2 2 G2
# 8  6 6 G2
# 9  4 4 G2
# 10 1 1 G2
# 11 9 9 G3
# 12 3 3 G3
# 13 7 7 G3
# 14 8 8 G3
# 15 4 4 G3
# 16 6 6 G4
# 17 5 5 G4
# 18 2 2 G4
# 19 1 1 G4
# 20 8 8 G4

答案 1 :(得分:2)

这是一种实现。我们可以根据最后一行将split的小标题变成list的小标题,用listimap的列rename循环到相同的列名('a','b'),mutate以创建具有list名称的列'c'并绑定行

library(tidyverse)
df %>% 
   slice(-n()) %>%
   split.default(df %>% 
                    slice(n())  %>% 
                    flatten_chr) %>%
     imap_dfr(~ .x %>% 
               rename_all(~ c('a', 'b')) %>%
     mutate(c = .y))
# A tibble: 20 x 3
#   a     b     c    
#   <chr> <chr> <chr>
# 1 5     5     G1   
# 2 3     3     G1   
# 3 5     5     G1   
# 4 6     6     G1   
# 5 2     2     G1   
# 6 8     8     G2   
# 7 2     2     G2   
# 8 6     6     G2   
# 9 4     4     G2   
#10 1     1     G2   
#11 9     9     G3   
#12 3     3     G3   
#13 7     7     G3   
#14 8     8     G3   
#15 4     4     G3   
#16 6     6     G4   
#17 5     5     G4   
#18 2     2     G4   
#19 1     1     G4   
#20 8     8     G4  

答案 2 :(得分:1)

如果逐步执行,移置可能不会受到损害。在此基础R解决方案中,行和列信息一直保留到最后一行。

d <- data.frame(t(as.matrix(df)))
l <- lapply(split(d[-6], d$X6), t)
res <- do.call(rbind, Map(cbind, l, c=names(l)))
res <- setNames(data.frame(res, row.names=NULL), letters[1:3])
res
#    a b  c
# 1  5 5 G1
# 2  3 3 G1
# 3  5 5 G1
# 4  6 6 G1
# 5  2 2 G1
# 6  8 8 G2
# 7  2 2 G2
# 8  6 6 G2
# 9  4 4 G2
# 10 1 1 G2
# 11 9 9 G3
# 12 3 3 G3
# 13 7 7 G3
# 14 8 8 G3
# 15 4 4 G3
# 16 6 6 G4
# 17 5 5 G4
# 18 2 2 G4
# 19 1 1 G4
# 20 8 8 G4

答案 3 :(得分:0)

带有data.table的一个选项

首先,由于我们没有使用原始名称,请替换它们。还要删除最后一行并将所有内容转换为整数。

library(data.table)
setDT(df)

df <- df[-.N]
df[, names(df) := lapply(.SD, as.integer)]
setnames(df, rep_len(c('a', 'b'), ncol(df)))

#    a b a b a b a b
# 1: 5 5 8 8 9 9 6 6
# 2: 3 3 2 2 3 3 5 5
# 3: 5 5 6 6 7 7 2 2
# 4: 6 6 4 4 8 8 1 1
# 5: 2 2 1 1 4 4 8 8

现在在行号上melt,添加G [1-4]列,并将融化的df dcast转换为宽格式。

df[, rid := 1:.N]
df2 <- melt(df, 'rid')
df2[, c := paste0('G', rowid(rid, variable))]
dcast(df2, rid + c ~ variable)[order(c), -'rid']

#      c a b
#  1: G1 5 5
#  2: G1 3 3
#  3: G1 5 5
#  4: G1 6 6
#  5: G1 2 2
#  6: G2 8 8
#  7: G2 2 2
#  8: G2 6 6
#  9: G2 4 4
# 10: G2 1 1
# 11: G3 9 9
# 12: G3 3 3
# 13: G3 7 7
# 14: G3 8 8
# 15: G3 4 4
# 16: G4 6 6
# 17: G4 5 5
# 18: G4 2 2
# 19: G4 1 1
# 20: G4 8 8