如何订购数据集的行信息

时间:2017-12-10 16:05:03

标签: r rows

请帮帮我...我的数据集包含每年学位的信息,如下:

Year1  Deg_Year1  Year2  Deg_Year2  Year3  Deg_Year3  Year4  Deg_Year4  Year5  Deg_Year5   
2001   College    2004   Master      NA      NA        NA       NA        NA       NA
2004   College    2004   Master      2010    PHD       NA       NA        NA       NA 
2006   Master     2006   College     NA      NA        NA       NA        NA       NA
2016   Master     NA     NA          NA      NA        NA       NA        NA       NA
2002   Master     2003   Master      2004    College   2004     Master    NA       NA  
2002   Master     2002   College     NA      NA        NA       NA        NA       NA 

我想获得一个包含2015年之前获得的年份和最高学位的数据框,如下所示:

YearX   Highest_Degree
2004    Master
2010    PHD
2006    Master
NA      NA
2004    Master
2002    Master

有人可以帮帮我吗? 谢谢!

1 个答案:

答案 0 :(得分:1)

我们可以在订单中创建vector度数,然后针对' Deg_Year'创建match度数。列,使用max.col获取每行中的最大值,以对值进行分组以及相应的“年份”。在每一行

v1 <- c('Master', 'PHD')
nm1 <- grep('Deg', names(df1))
m1 <- sapply(df1[nm1], match, table = v1, nomatch = 0)
i1 <- max.col(m1) * NA^(!rowSums(m1!=0))
YearX <- df1[nm1-1][cbind(seq_len(nrow(df1)), i1)]
Highest_Degree <- df1[nm1][cbind(seq_len(nrow(df1)), i1)]
data.frame(YearX, Highest_Degree)
#  YearX Highest_Degree
#1  2004         Master
#2  2010            PHD
#3  2006         Master
#4    NA           <NA>
#5  2004         Master
#6  2002         Master

数据

 df1 <- structure(list(Year1 = c(2001L, 2004L, 2006L, 2016L, 2002L, 2002L
), Deg_Year1 = c("College", "College", "Master", "College", "Master", 
"Master"), Year2 = c(2004L, 2004L, 2006L, NA, 2003L, 2002L), 
    Deg_Year2 = c("Master", "Master", "College", NA, "Master", 
    "College"), Year3 = c(NA, 2010L, NA, NA, 2004L, NA), Deg_Year3 = c(NA, 
    "PHD", NA, NA, "College", NA), Year4 = c(NA, NA, NA, NA, 
    2004L, NA), Deg_Year4 = c(NA, NA, NA, NA, "Master", NA), 
    Year5 = c(NA, NA, NA, NA, NA, NA), Deg_Year5 = c(NA, NA, 
    NA, NA, NA, NA)), .Names = c("Year1", "Deg_Year1", "Year2", 
"Deg_Year2", "Year3", "Deg_Year3", "Year4", "Deg_Year4", "Year5", 
"Deg_Year5"), class = "data.frame", row.names = c(NA, -6L))