如果b的值与最近的前一个值匹配,如何获得“a”的值,例如,b的行$ 3与前一行$ 1匹配,行$ 6与行$ 4匹配
df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
b = c(30.133,29,30.1223,33,17,33,11,17,14,13.913,14))
year a b *NEW*
2013 10 30.133 NA
2013 11 29 NA
2014 NA 30.1223 10
2014 13 33 NA
2014 22 17 NA
2015 NA 33 13
2015 19 11 NA
2015 NA 17 22
2016 10 14 NA
2016 15 13.913 10
2016 NA 14 15
由于
答案 0 :(得分:3)
一种方法是使用duplicated()
功能。
# Input dataframe
df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
b = c(30,29,30,33,17,33,11,17,14,14,14))
# creating a new column with default values
df$NEW <- NA
# updating the value using the previous matching position
df$NEW[duplicated(df$b)] <- df$a[duplicated(df$b,fromLast = TRUE)]
# expected output
df
# year a b NEW
# 1 2013 10 30 NA
# 2 2013 11 29 NA
# 3 2014 NA 30 10
# 4 2014 13 33 NA
# 5 2014 22 17 NA
# 6 2015 NA 33 13
# 7 2015 19 11 NA
# 8 2015 NA 17 22
# 9 2016 10 14 NA
# 10 2016 15 14 10
# 11 2016 NA 14 15
当重复项不按顺序排列时,上述解决方案失败。按照@DavidArenburg的建议。我更改了第四个元素df$b[4] <- 14
。一般解决方案需要使用另一个方便的函数order()
,并且应该适用于不同的可能情况。
# Input dataframe
df <- data.frame(year = c(2013,2013,2014,2014,2014,2015,2015,2015,2016,2016,2016),
a = c(10,11,NA,13,22,NA,19,NA,10,15,NA),
b = c(30,29,30,14,17,33,11,17,14,14,14))
# creating a new column with default values
df$NEW <- NA
# sort the matching column
df <- df[order(df$b),]
# updating the value using the previous matching position
df$NEW[duplicated(df$b)] <- df$a[duplicated(df$b,fromLast = TRUE)]
# To original order
df <- df[order(as.integer(rownames(df))),]
# expected output
df
# year a b NEW
# 1 2013 10 30 NA
# 2 2013 11 29 NA
# 3 2014 NA 30 10
# 4 2014 13 14 NA
# 5 2014 22 17 NA
# 6 2015 NA 33 NA
# 7 2015 19 11 NA
# 8 2015 NA 17 22
# 9 2016 10 14 13
# 10 2016 15 14 10
# 11 2016 NA 14 15
此处,解决方案基于base
包'功能。我相信还有其他方法可以使用其他软件包。