计算所选行的最大值

时间:2019-09-02 15:05:10

标签: r dataframe dplyr

我的数据框的head如下所示:

structure(list(wbcode = c("ARG", "ARG", "ARG", "ARG", "ARG", 
"ARG", "ARG", "ARG", "ARG", "ARG", "ARG", "ARG", "ARG", "ARG", 
"ARG", "ARG", "ARG", "ARG", "ARG", "ARG", "ARG", "ARG", "ARG", 
"ARG", "ARG", "ARG"), End = c(NA, NA, NA, NA, NA, NA, 1982, NA, 
NA, NA, NA, NA, NA, NA, NA, 1991, NA, NA, NA, NA, NA, 1995, NA, 
NA, NA, NA), LS = c(0.958041958041958, 1.20320197044335, 1.16087598763312, 
0.354430888167198, 0.0475120757386165, 0.0236186492578896, 0.0916911204214743, 
0.14338253921938, 0.408800511837039, 0.385495983810026, 0.244688077879152, 
NA, NA, NA, NA, NA, 1.23774478543667, 1.06301680926773, 0.670834486120376, 
0.60283371506345, 0.437946526596944, 0.468570146238378, 0.30623825822946, 
0.0241300985598649, 0.0201213236433166, 0.0223558659752478), 
    year = c("1974", "1975", "1976", "1977", "1978", "1979", 
    "1980", "1981", "1982", "1983", "1984", "1985", "1986", "1987", 
    "1988", "1989", "1990", "1991", "1992", "1993", "1994", "1995", 
    "1996", "1997", "1998", "1999")), row.names = c(NA, -26L), class = c("tbl_df", 
"tbl", "data.frame"))

我要实现的是创建一个新列LS_max,其中包含LSyear之间的End的最大值(如果{{1} }存在)。产生的数据框如下所示:

End

请注意,原始数据帧包含不止一种# A tibble: 26 x 4 # wbcode End LS year LS_max # <chr> <dbl> <dbl> <chr> <dbl> # 1 ARG NA 0.958 1974 NA # 2 ARG NA 1.20 1975 NA # 3 ARG NA 1.16 1976 NA # 4 ARG NA 0.354 1977 NA # 5 ARG NA 0.0475 1978 NA # 6 ARG NA 0.0236 1979 NA # 7 ARG 1982 0.0917 1980 0.409 # 8 ARG NA 0.143 1981 NA # 9 ARG NA 0.409 1982 NA #10 ARG NA 0.385 1983 NA #11 ARG NA 0.245 1984 NA #12 ARG NA NA 1985 NA #13 ARG NA NA 1986 NA #14 ARG NA NA 1987 NA #15 ARG NA NA 1988 NA #16 ARG 1991 NA 1989 1.24 #17 ARG NA 1.24 1990 NA #18 ARG NA 1.06 1991 NA #19 ARG NA 0.671 1992 NA #20 ARG NA 0.603 1993 NA #21 ARG NA 0.438 1994 NA #22 ARG 1995 0.469 1995 0.469 #23 ARG NA 0.306 1996 NA #24 ARG NA 0.0241 1997 NA #25 ARG NA 0.0201 1998 NA #26 ARG NA 0.0224 1999 NA 类型。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

一种选择是根据“结束”列中NA的出现来创建分组列,获取“ LS”的max并随后删除分组列

library(dplyr)
df1 %>% 
  group_by(wbcode, grp = cumsum(!is.na(End))) %>% 
  mutate(LS_max = max(LS, na.rm = TRUE) * NA^is.na(End))%>%
  ungroup %>%
  select(-grp) %>%
  as.data.frame
#   wbcode  End         LS year    LS_max
#1     ARG   NA 0.95804196 1974        NA
#2     ARG   NA 1.20320197 1975        NA
#3     ARG   NA 1.16087599 1976        NA
#4     ARG   NA 0.35443089 1977        NA
#5     ARG   NA 0.04751208 1978        NA
#6     ARG   NA 0.02361865 1979        NA
#7     ARG 1982 0.09169112 1980 0.4088005
#8     ARG   NA 0.14338254 1981        NA
#9     ARG   NA 0.40880051 1982        NA
#10    ARG   NA 0.38549598 1983        NA
#11    ARG   NA 0.24468808 1984        NA
#12    ARG   NA         NA 1985        NA
#13    ARG   NA         NA 1986        NA
#14    ARG   NA         NA 1987        NA
#15    ARG   NA         NA 1988        NA
#16    ARG 1991         NA 1989 1.2377448
#17    ARG   NA 1.23774479 1990        NA
#18    ARG   NA 1.06301681 1991        NA
#19    ARG   NA 0.67083449 1992        NA
#20    ARG   NA 0.60283372 1993        NA
#21    ARG   NA 0.43794653 1994        NA
#22    ARG 1995 0.46857015 1995 0.4685701
#23    ARG   NA 0.30623826 1996        NA
#24    ARG   NA 0.02413010 1997        NA
#25    ARG   NA 0.02012132 1998        NA
#26    ARG   NA 0.02235587 1999        NA

答案 1 :(得分:1)

这可以通过使用复杂条件聚合自连接来实现。

此左连接到a的{​​{1}}实例的每一行,到DF的{​​{1}}实例的b实例的所有行,具有相同的DF并且满足wbcode条件。 然后,对于结果中的每个between行,我们从a中提取连接的LS值中的最大值。

b

给予:

library(sqldf)

sqldf("select a.*, max(b.LS) as LS_max 
  from DF a 
  left join DF b on a.wbcode = b.wbcode and b.year between a.year and a.End
  group by a.rowid")