向量序列上的子集数据帧

时间:2012-11-30 11:47:20

标签: r

我有数据框df,我希望根据分类中的数字序列对df进行分组。

 x  <- c(1,2,3,4,5,7,9,11,13)
 x2 <- x+77 
 df <- data.frame(x=c(x,x2),y= c(rep("A",9),rep("B",9)))

 df
    x y
1   1 A
2   2 A
3   3 A
4   4 A
5   5 A
6   7 A
7   9 A
8  11 A
9  13 A
10 78 B
11 79 B
12 80 B
13 81 B
14 82 B
15 84 B
16 86 B
17 88 B
18 90 B

我只希望x增加1的行而不是x增加2的行:例如

    x y
1   1 A
2   2 A
3   3 A
4   4 A
5   5 A
10 78 B
11 79 B
12 80 B
13 81 B
14 82 B

我想我必须在元素之间进行一些减法,并检查差异是否为>1并将其与ddply结合起来,但这看起来很麻烦。我缺少某种sequence功能吗?

2 个答案:

答案 0 :(得分:3)

使用diff

df[which(c(1,diff(df$x))==1),]

答案 1 :(得分:2)

你的例子似乎表现得很好,可以通过@ agstudy的答案很好地处理。如果你的数据有一天起作用,那么......

myfun <- function(d, whichDiff = 1) {
  # d is the data.frame you'd like to subset, containing the variable 'x'
  # whichDiff is the difference between values of x you're looking for

  theWh <- which(!as.logical(diff(d$x) - whichDiff))
  # Take the diff of x, subtract whichDiff to get the desired values equal to 0
  # Coerce this to a logical vector and take the inverse (!)
  # which() gets the indexes that are TRUE.

  # allWh <- sapply(theWh, "+", 1)
  # Since the desired rows may be disjoint, use sapply to get each index + 1
  # Seriously? sapply to add 1 to a numeric vector? Not even on a Friday.
  allWh <- theWh + 1

  return(d[sort(unique(c(theWh, allWh))), ])
}

> library(plyr)
> 
> ddply(df, .(y), myfun)
    x y
1   1 A
2   2 A
3   3 A
4   4 A
5   5 A
6  78 B
7  79 B
8  80 B
9  81 B
10 82 B