列根据字典合并在同一列中

时间:2019-12-18 17:10:22

标签: r merge

我有这个data.frame:

data.frame(identifiant = paste0("I",c(1:6)),
           project1 = c(1,NA,3,NA,NA,NA),project2 = c(NA,NA,NA,4,5,NA),project3 = c(NA,NA,NA,NA,NA,6),
           subject1 = c("A","B",NA,NA,NA,NA),subject2 = c(NA,NA,"C","D","E",NA),subject3 = c(NA,NA,NA,NA,NA,"F")) 

此数据:

   identifiant project1 project2 project3 subject1 subject2 subject3
1          I1        1       NA       NA        A     <NA>     <NA>
2          I2       NA       NA       NA        B     <NA>     <NA>
3          I3        3       NA       NA     <NA>        C     <NA>
4          I4       NA        4       NA     <NA>        D     <NA>
5          I5       NA        5       NA     <NA>        E     <NA>
6          I6       NA       NA        6     <NA>     <NA>        F

我想要这个数据:

   identifiant project subject
1          I1       1       A
2          I2      NA       B
3          I3       3       C
4          I4       4       D
5          I5       5       E
6          I6       6       F

所以一列合并但根据一个“数据字典”:

dictionary <- data.frame(old = c("identifiant","project1", "project2", "project3","subject1", "subject2", "subject3"),
                          new = c("identifiant","project","project","project","subject","subject","subject")) 

          old         new
1 identifiant identifiant
2    project1     project
3    project2     project
4    project3     project
5    subject1     subject
6    subject2     subject
7    subject3     subject

任何人都可以解决而不复杂的事情吗?

2 个答案:

答案 0 :(得分:2)

另一种选择是coalesce应用于两组列(“项目”,“主题”)

library(dplyr)
df1 %>% 
   mutate_at(vars(starts_with('subject')), as.character)%>% 
   transmute(identifiant, 
             project = coalesce(!!! select(., starts_with('project'))), 
             subject = coalesce(!!! select(., starts_with('subject'))))
#  identifiant project subject
#1          I1       1       A
#2          I2       2       B
#3          I3       3       C
#4          I4       4       D
#5          I5       5       E
#6          I6       6       F

或者,如果我们需要使用“字典”数据集,则根据“新”值将其拆分为list

library(purrr)
dictionary %>%
     group_split(new, keep = FALSE)  %>% 
     map_dfc(~ .x %>% 
                 pull(old) %>% 
                 as.character(.x) %>% 
                 select(df1, .) %>%
                 mutate_all(as.character) %>% 
                 transmute(new = coalesce(!!! .))) %>% 
     set_names(unique(dictionary$new))

或者,如果我们使用命名的list,则可以在作业imap中使用:=

dictionary %>%
    {split(as.character(.$old), .$new)} %>% 
    imap_dfc(~ select(df1, .x) %>% 
           mutate_all(as.character) %>%
           transmute(!! .y := coalesce(!!! .)))
#  identifiant project subject
#1          I1       1       A
#2          I2       2       B
#3          I3       3       C
#4          I4       4       D
#5          I5       5       E
#6          I6       6       F

答案 1 :(得分:1)

有了dplyr,我们可以做到:

df %>% 
   tidyr::pivot_longer(cols= contains("project"),
                       names_to="proj",
                       values_to = "vals") %>% 
   filter(!is.na(vals)) %>% 
   tidyr::pivot_longer(cols=contains("subject"),
                       names_to = "subj",
                       values_to = "other_val") %>% 
   na.omit() %>% 
  rename(project=vals, subject=other_val) %>% 
   select(-proj,-subj)
# A tibble: 6 x 3
  identifiant project subject
  <fct>         <dbl> <fct>  
1 I1                1 A      
2 I2                2 B      
3 I3                3 C      
4 I4                4 D      
5 I5                5 E      
6 I6                6 F