Question

我有一个如下所示的数据框：

我想确定一种方法来标记第一次出现的id - 类似于first。最后。在SAS。我已经尝试了！duplicated函数，但是我需要将“flag”列附加到我的数据框中，因为我稍后会通过循环运行它。我想得到这样的东西：

id  score   first_ind
1   15      1
1   18      0
1   16      0
2   10      1
2   9       0
3   8       1
3   47      0
3   21      0

Answer 1

> df$first_ind <- as.numeric(!duplicated(df$id))
> df
  id score first_ind
1  1    15         1
2  1    18         0
3  1    16         0
4  2    10         1
5  2     9         0
6  3     8         1
7  3    47         0
8  3    21         0

Answer 2

您可以使用diff找到边缘。

x <- read.table(text = "id  score
1   15
1   18
1   16
2   10
2   9
3   8
3   47
3   21", header = TRUE)

x$first_id <- c(1, diff(x$id))
x

  id score first_id
1  1    15        1
2  1    18        0
3  1    16        0
4  2    10        1
5  2     9        0
6  3     8        1
7  3    47        0
8  3    21        0

Answer 3

使用plyr：

library("plyr")
ddply(x,"id",transform,first=as.numeric(seq(length(score))==1))

或者您更喜欢dplyr：

x %>% group_by(id) %>% 
    mutate(first=c(1,rep(0,n-1)))

（尽管如果你完全在plyr / dplyr框架中操作，你可能不需要这个标志变量......）

Answer 4

另一个基本R选项：

df$first_ind <- ave(df$id, df$id, FUN = seq_along) == 1
df
#  id score first_ind
#1  1    15      TRUE
#2  1    18     FALSE
#3  1    16     FALSE
#4  2    10      TRUE
#5  2     9     FALSE
#6  3     8      TRUE
#7  3    47     FALSE
#8  3    21     FALSE

这也适用于未分类id的情况。如果你想要1/0而不是T / F，你可以轻松地将它包装在as.integer(.)。

中

在R数据帧中首先按组标记

4 个答案: