Question

我需要在df中循环使用多年，并且只有在下一年的percentpp更大时才替换另一列（percentpp）中的值。一旦一个单元格达到一个很高的值，它就不能再向下 - 只能向上，就像棘轮一样。

例如，我拥有的东西：

  cell_id   year    percentpp
1   40      2011    3
2   41      2011    1
3   42      2011    0
4   43      2011    0
5   40      2012    1
6   41      2012    5
7   42      2012    1   
8   43      2012    5
9   40      2013    2
10  41      2013    2
11  42      2013    2
12  43      2013    0
13  40      2014    2   
14  41      2014    3   
15  42      2014    3   
16  43      2014    3

以及我希望它成为：

  cell_id   year    percentpp
1   40      2011    3
2   41      2011    1
3   42      2011    0
4   43      2011    0
5   40      2012    3
6   41      2012    5
7   42      2012    1   
8   43      2012    5
9   40      2013    3
10  41      2013    5
11  42      2013    2   
12  43      2013    5    
13  40      2014    3   
14  41      2014    5   
15  42      2014    3   
16  43      2014    5

我想象一个使用这种伪R / SQL逻辑的函数，然后使用lapply循环：

function(x) {
  if df$percentpp[year+1,] is greater than df$percentpp[year,]
  when cell_id$year = cell_id$year+1
  then df$percentpp[year,] <- df$percentpp[year+1,]
}

但我不确定如何正确地做到这一点。

Answer 1

您可以使用cummax。

使用基数R：

df <- df[order(df$year), ]
df$percentpp <- ave(df$percentpp, df$cell_id, FUN = cummax)

使用dplyr：

library(dplyr)

df <- df %>%
  group_by(cell_id) %>%
  arrange(year) %>%
  mutate(percentpp = cummax(percentpp)) %>%
  ungroup

数据：

df <- read.table(text = "
  cell_id   year    percentpp
1   40      2011    3
2   41      2011    1
3   42      2011    0
4   43      2011    0
5   40      2012    1
6   41      2012    5
7   42      2012    1   
8   43      2012    5
9   40      2013    2
10  41      2013    2
11  42      2013    2
12  43      2013    0
13  40      2014    2   
14  41      2014    3   
15  42      2014    3   
16  43      2014    3 
")

＆＃34;棘轮＆＃34;功能：仅在明年的结果更高时才替换上一年的结果

1 个答案:

使用基数R：

使用dplyr：

数据：