我有这个data.frame:
data.frame(identifiant = paste0("I",c(1:6)),
project1 = c(1,NA,3,NA,NA,NA),project2 = c(NA,NA,NA,4,5,NA),project3 = c(NA,NA,NA,NA,NA,6),
subject1 = c("A","B",NA,NA,NA,NA),subject2 = c(NA,NA,"C","D","E",NA),subject3 = c(NA,NA,NA,NA,NA,"F"))
此数据:
identifiant project1 project2 project3 subject1 subject2 subject3
1 I1 1 NA NA A <NA> <NA>
2 I2 NA NA NA B <NA> <NA>
3 I3 3 NA NA <NA> C <NA>
4 I4 NA 4 NA <NA> D <NA>
5 I5 NA 5 NA <NA> E <NA>
6 I6 NA NA 6 <NA> <NA> F
我想要这个数据:
identifiant project subject
1 I1 1 A
2 I2 NA B
3 I3 3 C
4 I4 4 D
5 I5 5 E
6 I6 6 F
所以一列合并但根据一个“数据字典”:
dictionary <- data.frame(old = c("identifiant","project1", "project2", "project3","subject1", "subject2", "subject3"),
new = c("identifiant","project","project","project","subject","subject","subject"))
old new
1 identifiant identifiant
2 project1 project
3 project2 project
4 project3 project
5 subject1 subject
6 subject2 subject
7 subject3 subject
任何人都可以解决而不复杂的事情吗?
答案 0 :(得分:2)
另一种选择是coalesce
应用于两组列(“项目”,“主题”)
library(dplyr)
df1 %>%
mutate_at(vars(starts_with('subject')), as.character)%>%
transmute(identifiant,
project = coalesce(!!! select(., starts_with('project'))),
subject = coalesce(!!! select(., starts_with('subject'))))
# identifiant project subject
#1 I1 1 A
#2 I2 2 B
#3 I3 3 C
#4 I4 4 D
#5 I5 5 E
#6 I6 6 F
或者,如果我们需要使用“字典”数据集,则根据“新”值将其拆分为list
library(purrr)
dictionary %>%
group_split(new, keep = FALSE) %>%
map_dfc(~ .x %>%
pull(old) %>%
as.character(.x) %>%
select(df1, .) %>%
mutate_all(as.character) %>%
transmute(new = coalesce(!!! .))) %>%
set_names(unique(dictionary$new))
或者,如果我们使用命名的list
,则可以在作业imap
中使用:=
dictionary %>%
{split(as.character(.$old), .$new)} %>%
imap_dfc(~ select(df1, .x) %>%
mutate_all(as.character) %>%
transmute(!! .y := coalesce(!!! .)))
# identifiant project subject
#1 I1 1 A
#2 I2 2 B
#3 I3 3 C
#4 I4 4 D
#5 I5 5 E
#6 I6 6 F
答案 1 :(得分:1)
有了dplyr
,我们可以做到:
df %>%
tidyr::pivot_longer(cols= contains("project"),
names_to="proj",
values_to = "vals") %>%
filter(!is.na(vals)) %>%
tidyr::pivot_longer(cols=contains("subject"),
names_to = "subj",
values_to = "other_val") %>%
na.omit() %>%
rename(project=vals, subject=other_val) %>%
select(-proj,-subj)
# A tibble: 6 x 3
identifiant project subject
<fct> <dbl> <fct>
1 I1 1 A
2 I2 2 B
3 I3 3 C
4 I4 4 D
5 I5 5 E
6 I6 6 F