子集和更改分组数据框的值

时间:2017-05-08 19:06:16

标签: r dplyr

我需要:

  • 按年分组
  • 如果任何列的前五个值中的值threshold更高
  • 将其更改为threshold值,否则
  • 原样离开。 我需要这适用于具有许多列和组的数据框。 https://github.com/Azure/acs-engine/pull/498但遗憾的是,我不知道。

数据框:

             year var1 var2  #Desired output:
        1     1   10    1    #change first five values in  var1 to twofor year 1
        2     1   11    1
        3     1   12    1
        4     1   13    1
        5     1   14    1
        6     1   15    1
        7     1   16    1
        8     1   17    1
        9     1   18    1
        10    1   19    1
        11    2   20    1     #change first five values in  var1 to 2 for year 2
        12    2   21    1     #var2 stays the same since it is below threshold
        13    2   22    1
        14    2   23    1
        15    2   24    1
        16    2   25    1
        17    2   26    1
        18    2   27    1
        19    2   28    1
        20    2   29    1

数据框代码:

threshold <- 2        
df <- data.frame(year = c(rep(1, 10,), rep(2,10)),
                     var1 = seq(10,29, 1),
                     var2 = rep(1,20))
    df_out <- data.frame(year = c(rep(1, 10,), rep(2,10)),
                     var1 = c(rep(2,5), seq(16,20, 1), rep(2,5),seq(26,30)),
                     var2 = rep(1,20))

2 个答案:

答案 0 :(得分:1)

我认为使用data.tableifelse会起作用:

setDT(df)[,var1 := ifelse(var1 >= threshold & seq_len(.N) <= 5, threshold, var1), by = year]

df
 #   year var1 var2
 #1:    1    2    1
 #2:    1    2    1
 #3:    1    2    1
 #4:    1    2    1
 #5:    1    2    1
 #6:    1   15    1
 #7:    1   16    1
 #8:    1   17    1
 #9:    1   18    1
#10:    1   19    1
#11:    2    2    1
#12:    2    2    1
#13:    2    2    1
#14:    2    2    1
#15:    2    2    1
#16:    2   25    1
#17:    2   26    1
#18:    2   27    1
#19:    2   28    1
#20:    2   29    1

答案 1 :(得分:1)

您可以在dplyr中使用类似的内容(我正在添加带有示例的第3列,以显示它适用于多列):

df$var3 <- sample(1:100, nrow(df))
head(df)
  year var1 var2 var3
1    1   10    1   54
2    1   11    1   60
3    1   12    1   26
4    1   13    1   28
5    1   14    1    7
6    1   15    1  100

df %>%
  group_by(year) %>%
  mutate_each(funs(ifelse(row_number() <= 5 & . >= threshold, threshold, .)))
Source: local data frame [20 x 4]
Groups: year [2]

    year  var1  var2  var3
   <dbl> <dbl> <dbl> <dbl>
1      1     2     1     2
2      1     2     1     2
3      1     2     1     2
4      1     2     1     2
5      1     2     1     2
6      1    15     1   100
7      1    16     1    25
8      1    17     1     1
9      1    18     1    55
10     1    19     1    48
11     2     2     1     2
12     2     2     1     2
13     2     2     1     2
14     2     2     1     2
15     2     2     1     2
16     2    25     1     9
17     2    26     1    63
18     2    27     1     2
19     2    28     1     5
20     2    29     1    61