如何在组中使用ID变异列

时间:2014-09-04 06:47:45

标签: r dplyr

如何改变组中具有ID的列

data.frame like:

a b c
1 a 1 1
2 a 1 2
3 a 2 3
4 b 1 4
5 b 2 5
6 b 3 6

分组a,标志以1开头,如果b等于pre b,则flag = 1 else flag + = 1

  a b c flag
1 a 1 1    1  <-  group a start with 1
2 a 1 2    1  <-- in group a, 1(in row 2)=1(in row 1)
3 a 2 3    2  <-  in group a, 2(in row 3)!=1(in row 2)
4 b 1 4    1  <-  group b start with 1
5 b 2 5    2  <-  in group b, 2(in row 5)!=1(in row 4)
6 b 3 6    3  <-  in group b, 3(in row 6)!=2(in row 5)

3 个答案:

答案 0 :(得分:2)

我现在用这个:

for(i in 2:nrow(x)){
    x[i, 'flag'] = ifelse(x[i, 'a']!=x[i-1,'a'], 1, ifelse(x[i, 'b']==x[i-1, 'b'], x[i-1, 'flag'], x[i-1,'flag']+1))
}

但是在大​​型数据集中效率低下

更新

dplyr中的

dense_rank给我答案

> x %>% group_by(a) %>% mutate(dense_rank(b))
Source: local data frame [10 x 4]
Groups: a

   a b  c dense_rank(b)
1  a x  1             1
2  a x  2             1
3  a y  3             2
4  b x  4             1
5  b y  5             2
6  b z  6             3
7  c x  7             1
8  c y  8             2
9  c z  9             3
10 c z 10             3

感谢。

答案 1 :(得分:1)

我不完全确定你要做什么。但在我看来,你试图为每个组(a或b)的b中的值分配索引号。

#I modified your example here.

a <- rep(c("a","b"), each =3)
b <- c(4,4,5,11,12,13)
c <- 1:6

foo <- data.frame(a,b,c, stringsAsFactors = F)

  a  b c
1 a  4 1
2 a  4 2
3 a  5 3
4 b 11 4
5 b 12 5
6 b 13 6

#Since you referred to dplyr, I will use it.

cats <- list()
for(i in unique(foo$a)){

ana <- foo %>%
       filter(a == i) %>%
       arrange(b) %>%
       mutate(indexInb = as.integer(as.factor(b)))

cats[[i]] <- ana

}

bob <- rbindlist(cats)

   a  b c indexInb
1: a  4 1        1
2: a  4 2        1
3: a  5 3        2
4: b 11 4        1
5: b 12 5        2
6: b 13 6        3

答案 2 :(得分:1)

她是一种快速的矢量化解决方法,无需使用任何for循环

使用avetransform

的基础R解决方案
transform(x, flag = ave(b, a, FUN = function(x) cumsum(c(1, diff(x)))))
#   a b c flag
# 1 a 1 1    1
# 2 a 1 2    1
# 3 a 2 3    2
# 4 b 1 4    1
# 5 b 2 5    2
# 6 b 3 6    3

data.table解决方案(效率更高)

library(data.table)
setDT(x)[, flag := cumsum(c(1, diff(b))), by = a]
x
#    a b c flag
# 1: a 1 1    1
# 2: a 1 2    1
# 3: a 2 3    2
# 4: b 1 4    1
# 5: b 2 5    2
# 6: b 3 6    3

dplyr解决方案(因为您标记了它)

library(dplyr)
x %>%
  group_by(a) %>%
  mutate(flag = cumsum(c(1, diff(b))))
# Source: local data frame [6 x 4]
# Groups: a
# 
#   a b c flag
# 1 a 1 1    1
# 2 a 1 2    1
# 3 a 2 3    2
# 4 b 1 4    1
# 5 b 2 5    2
# 6 b 3 6    3