通过检测每个id的最大值来创建新变量

时间:2015-08-10 17:33:32

标签: r

我的数据集包含三个变量:

id <- c(1,1,1,1,1,1,2,2,2,2,5,5,5,5,5,5)
ind <- c(0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1)
price <- c(1,2,3,4,5,6,1,2,3,4,1,2,3,4,5,6)
mdata <- data.frame(id,ind,price)

我需要创建一个新变量(ind2),如果ind = 0,则ind2 = 0。 另外,如果ind = 1,则ind2 = 0,除非价格值为max,则ind2 = 1.

新数据如下:

id  ind ind2    price

1   0   0   1
1   0   0   2
1   0   0   3
1   0   0   4
1   0   0   5
1   0   0   6
2   1   0   1
2   1   0   2
2   1   0   3
2   1   1   4
5   1   0   1
5   1   0   2
5   1   0   3
5   1   0   4
5   1   0   5
5   1   1   6

1 个答案:

答案 0 :(得分:6)

library(dplyr)
mdata %>% 
  group_by(id) %>%
  mutate(ind2 = +(ind == 1L & price == max(price)))

#    id ind price ind2
# 1   1   0     1    0
# 2   1   0     2    0
# 3   1   0     3    0
# 4   1   0     4    0
# 5   1   0     5    0
# 6   1   0     6    0
# 7   2   1     1    0
# 8   2   1     2    0
# 9   2   1     3    0
# 10  2   1     4    1
# 11  5   1     1    0
# 12  5   1     2    0
# 13  5   1     3    0
# 14  5   1     4    0
# 15  5   1     5    0
# 16  5   1     6    1

或者如果您更喜欢data.table

setDT(mdata)[, ind2 := +(ind == 1L & price == max(price)), by = id]

或者用基础R

mdata$ind2 <- unlist(lapply(split(mdata,mdata$id), 
                            function(x) +(x$ind == 1L & x$price == max(x$price))))