如何获得以前的匹配值

时间:2017-07-11 10:45:36

标签: r dataframe

如果b的值与最近的前一个值匹配,如何获得“a”的值,例如,b的行$ 3与前一行$ 1匹配,行$ 6与行$ 4匹配

df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
           a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
           b = c(30.133,29,30.1223,33,17,33,11,17,14,13.913,14))

  year  a   b   *NEW*
2013    10  30.133  NA
2013    11  29      NA
2014    NA  30.1223 10
2014    13  33      NA
2014    22  17      NA
2015    NA  33      13
2015    19  11      NA
2015    NA  17      22
2016    10  14      NA
2016    15  13.913  10
2016    NA  14      15

由于

1 个答案:

答案 0 :(得分:3)

对于OP示例案例

一种方法是使用duplicated()功能。

# Input dataframe    
df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
                     a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
                     b = c(30,29,30,33,17,33,11,17,14,14,14))

# creating a new column with default values
df$NEW <- NA

# updating the value using the previous matching position
df$NEW[duplicated(df$b)] <- df$a[duplicated(df$b,fromLast = TRUE)]

# expected output
df
#    year  a  b NEW
# 1  2013 10 30  NA
# 2  2013 11 29  NA
# 3  2014 NA 30  10
# 4  2014 13 33  NA
# 5  2014 22 17  NA
# 6  2015 NA 33  13
# 7  2015 19 11  NA
# 8  2015 NA 17  22
# 9  2016 10 14  NA
# 10 2016 15 14  10
# 11 2016 NA 14  15

通用用法

当重复项不按顺序排列时,上述解决方案失败。按照@DavidArenburg的建议。我更改了第四个元素df$b[4] <- 14。一般解决方案需要使用另一个方便的函数order(),并且应该适用于不同的可能情况。

# Input dataframe    
df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
                 a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
                 b = c(30,29,30,14,17,33,11,17,14,14,14))

# creating a new column with default values
df$NEW <- NA

# sort the matching column
df <- df[order(df$b),]

# updating the value using the previous matching position
df$NEW[duplicated(df$b)] <- df$a[duplicated(df$b,fromLast = TRUE)]

# To original order
df <- df[order(as.integer(rownames(df))),]

# expected output
df
#    year  a  b NEW
# 1  2013 10 30  NA
# 2  2013 11 29  NA
# 3  2014 NA 30  10
# 4  2014 13 14  NA
# 5  2014 22 17  NA
# 6  2015 NA 33  NA
# 7  2015 19 11  NA
# 8  2015 NA 17  22
# 9  2016 10 14  13
# 10 2016 15 14  10
# 11 2016 NA 14  15

此处,解决方案基于base包'功能。我相信还有其他方法可以使用其他软件包。