请帮帮我...我的数据集包含每年学位的信息,如下:
Year1 Deg_Year1 Year2 Deg_Year2 Year3 Deg_Year3 Year4 Deg_Year4 Year5 Deg_Year5
2001 College 2004 Master NA NA NA NA NA NA
2004 College 2004 Master 2010 PHD NA NA NA NA
2006 Master 2006 College NA NA NA NA NA NA
2016 Master NA NA NA NA NA NA NA NA
2002 Master 2003 Master 2004 College 2004 Master NA NA
2002 Master 2002 College NA NA NA NA NA NA
我想获得一个包含2015年之前获得的年份和最高学位的数据框,如下所示:
YearX Highest_Degree
2004 Master
2010 PHD
2006 Master
NA NA
2004 Master
2002 Master
有人可以帮帮我吗? 谢谢!
答案 0 :(得分:1)
我们可以在订单中创建vector
度数,然后针对' Deg_Year'创建match
度数。列,使用max.col
获取每行中的最大值,以对值进行分组以及相应的“年份”。在每一行
v1 <- c('Master', 'PHD')
nm1 <- grep('Deg', names(df1))
m1 <- sapply(df1[nm1], match, table = v1, nomatch = 0)
i1 <- max.col(m1) * NA^(!rowSums(m1!=0))
YearX <- df1[nm1-1][cbind(seq_len(nrow(df1)), i1)]
Highest_Degree <- df1[nm1][cbind(seq_len(nrow(df1)), i1)]
data.frame(YearX, Highest_Degree)
# YearX Highest_Degree
#1 2004 Master
#2 2010 PHD
#3 2006 Master
#4 NA <NA>
#5 2004 Master
#6 2002 Master
df1 <- structure(list(Year1 = c(2001L, 2004L, 2006L, 2016L, 2002L, 2002L
), Deg_Year1 = c("College", "College", "Master", "College", "Master",
"Master"), Year2 = c(2004L, 2004L, 2006L, NA, 2003L, 2002L),
Deg_Year2 = c("Master", "Master", "College", NA, "Master",
"College"), Year3 = c(NA, 2010L, NA, NA, 2004L, NA), Deg_Year3 = c(NA,
"PHD", NA, NA, "College", NA), Year4 = c(NA, NA, NA, NA,
2004L, NA), Deg_Year4 = c(NA, NA, NA, NA, "Master", NA),
Year5 = c(NA, NA, NA, NA, NA, NA), Deg_Year5 = c(NA, NA,
NA, NA, NA, NA)), .Names = c("Year1", "Deg_Year1", "Year2",
"Deg_Year2", "Year3", "Deg_Year3", "Year4", "Deg_Year4", "Year5",
"Deg_Year5"), class = "data.frame", row.names = c(NA, -6L))