如何将这个宽数据帧转换成这个长数据帧?

时间:2020-03-21 21:38:57

标签: r tidyverse

如何转换此宽数据框

# A tibble: 2 x 7
  name  question_1  question_1_response question_2    question_2_response question_3 question_3_response
  <chr> <chr>                     <dbl> <chr>                       <dbl> <chr>                    <dbl>
1 ken   PC1,PC2,PC4                 4.5 PC3,MK1,MK2                   3.5 SBP1,SBP5                    5
2 hello PC1,PC5                     4   MK1,SBP1,SBP2                 4   NA                          NA

要这么做吗?

# A tibble: 13 x 3
   name  subcomp value
   <chr> <chr>   <dbl>
 1 ken   PC1       4.5
 2 ken   PC2       4.5
 3 ken   PC4       4.5
 4 ken   PC3       3.5
 5 ken   MK1       3.5
 6 ken   MK2       3.5
 7 ken   SBP1      5  
 8 ken   SBP5      5  
 9 hello PC1       4  
10 hello PC5       4  
11 hello MK1       4  
12 hello SBP1      4  
13 hello SBP2      4  

样本数据:

library(tidyverse)
test <- tribble(
  ~name, ~question_1, ~question_1_response, ~question_2, ~question_2_response, ~question_3, ~question_3_response,
  "ken", "PC1,PC2,PC4", 4.5, "PC3,MK1,MK2", 3.5, "SBP1,SBP5", 5,
  "hello", "PC1,PC5", 4, "MK1,SBP1,SBP2", 4, NA, NA
) 

我尝试使用聚集/分离/展开,但无法完全绕开它。 非常感谢!

2 个答案:

答案 0 :(得分:1)

我们可以使用rename str_replace {此处是捕获\\d+之后的数字(_)和单词({{ 1}}),其后紧跟\\w+。在替换中,以相反的顺序指定捕获组的后向引用(_\\1),并将其重塑为'long格式为\\2

pivot_longer

或使用library(dplyr) library(tidyr) library(stringr) test %>% rename_at(vars(ends_with('response')), ~ str_replace(., '_(\\d+)_(\\w+)', '\\2_\\1')) %>% pivot_longer(cols = -name, names_to = c('.value', 'group'), names_sep="_", values_drop_na = TRUE) %>% separate_rows(question)%>% select(name, subcomp = question, value = questionresponse) # A tibble: 13 x 3 # name subcomp value # <chr> <chr> <dbl> # 1 ken PC1 4.5 # 2 ken PC2 4.5 # 3 ken PC4 4.5 # 4 ken PC3 3.5 # 5 ken MK1 3.5 # 6 ken MK2 3.5 # 7 ken SBP1 5 # 8 ken SBP5 5 # 9 hello PC1 4 #10 hello PC5 4 #11 hello MK1 4 #12 hello SBP1 4 #13 hello SBP2 4

data.table

答案 1 :(得分:0)

涉及dplyrtidyrpurrr的一个选项可能是:

map_dfr(.x = split.default(test[-1], ceiling(1:length(test[-1])/2)),
        ~ .x %>%
         rowid_to_column() %>%
         separate_rows(2) %>%
         setNames(c("rowid", "subcomb", "value"))) %>%
 left_join(test %>%
            rowid_to_column() %>%
            select(rowid, name), by = c("rowid" = "rowid")) %>%
 filter(!is.na(subcomb))

   rowid subcomb value name 
   <int> <chr>   <dbl> <chr>
 1     1 PC1       4.5 ken  
 2     1 PC2       4.5 ken  
 3     1 PC4       4.5 ken  
 4     2 PC1       4   hello
 5     2 PC5       4   hello
 6     1 PC3       3.5 ken  
 7     1 MK1       3.5 ken  
 8     1 MK2       3.5 ken  
 9     2 MK1       4   hello
10     2 SBP1      4   hello
11     2 SBP2      4   hello
12     1 SBP1      5   ken  
13     1 SBP5      5   ken