R:将数据重塑为多列成行

时间:2020-06-25 16:08:17

标签: r layout spreadsheet tidyr data-wrangling

我有一个包含多列的df,您可以在下面找到我的庙宇。我想将其重塑为R中的列到行。我确定可以使用tidyr :: gather()函数,但是我无法对其进行管理。 如果有人可以帮助我,我将很高兴!

最美好的祝愿

# Df I have
             A1 A2 A3 A4  B1 B2 B3 B4  C1 C2 C3  C4  D1 D2 D3 D4
X1 X2 X3 X4   a b  c  d   e  f  g  h    i  j  k  l
Y1 Y2 Y3 Y4   m n  o  p    
Z1 Z2 Z3 Z4   r s  t  u   w  v  y  z 


# Df I would like to reshape

            Col1 Col2 Col3 Col4
X1 X2 X3 X4   a   b    c   d
X1 X2 X3 X4   e   f    g   h
X1 X2 X3 X4   i   j    k   l
Y1 Y2 Y3 Y4   m   n    o   p
Z1 Z2 Z3 Z4   r   s    t   u
Z1 Z2 Z3 Z4   w   v    y   z

2 个答案:

答案 0 :(得分:2)

我们也可以使用一个pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
      pivot_longer(cols = -id,  names_to = c("grp", ".value"), 
            names_sep="(?<=[A-Z])(?=[0-9])", values_drop_na = TRUE) %>% 
      select(-grp) %>%
      rename_at(-1, ~ str_c('Col', .))
# A tibble: 7 x 5
#     id Col1  Col2  Col3  Col4 
#  <int> <chr> <chr> <chr> <chr>
#1     1 a     b     c     d    
#2     1 e     f     g     h    
#3     1 i     j     k     l    
#4     2 m     n     o     p    
#5     2 q     <NA>  <NA>  <NA> 
#6     3 r     s     t     u    
#7     3 w     v     y     z    

数据

df <- structure(list(id = 1:3, A1 = c("a", "m", "r"), A2 = c("b", "n", 
"s"), A3 = c("c", "o", "t"), A4 = c("d", "p", "u"), B1 = c("e", 
"q", "w"), B2 = c("f", NA, "v"), B3 = c("g", NA, "y"), B4 = c("h", 
NA, "z"), C1 = c("i", NA, NA), C2 = c("j", NA, NA), C3 = c("k", 
NA, NA), C4 = c("l", NA, NA), D1 = c(NA, NA, NA), D2 = c(NA, 
NA, NA), D3 = c(NA, NA, NA), D4 = c(NA, NA, NA)), class = "data.frame",
row.names = c("1", 
"2", "3"))

答案 1 :(得分:1)

我敢打赌,还有更优雅的解决方案,但是该解决方案使用tidyrdplyr

假设您的数据看起来像

> df
# A tibble: 3 x 17
     id A1    A2    A3    A4    B1    B2    B3    B4    C1    C2    C3    C4    D1    D2    D3    D4   
  <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1     1 a     b     c     d     e     f     g     h     i     j     k     l     NA    NA    NA    NA   
2     2 m     n     o     p     q     NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
3     3 r     s     t     u     w     v     y     z     NA    NA    NA    NA    NA    NA    NA    NA

我用索引列替换了您的X1 X2 X3 X4, ...,并在q列的B1上添加了。

使用

df %>%
  pivot_longer(cols=matches("\\d$"), 
               names_to = c("set"),
               names_pattern = ".(.)") %>%
  pivot_wider(names_from="set", 
              names_prefix="Col",
              values_fn = list) %>%
  unnest(matches("\\d$")) %>%
  rowwise() %>%
  filter(sum(is.na(c_across(matches("\\d$")))) != ncol(.) - 1)  # -1 because of the indexing column

返回

# A tibble: 7 x 5
# Rowwise: 
     id Col1  Col2  Col3  Col4 
  <dbl> <chr> <chr> <chr> <chr>
1     1 a     b     c     d    
2     1 e     f     g     h    
3     1 i     j     k     l    
4     2 m     n     o     p    
5     2 q     NA    NA    NA   
6     3 r     s     t     u    
7     3 w     v     y     z