如何根据特定标准找到第一个外观

时间:2017-11-20 20:42:28

标签: r

我有以下代理数据,我正在使用数据表。

Id         V1       V2
Aa         10      1000
Aa          20       90
Ab          20       90
Ab          20       90
Ab          20       100
As          50       90
As          300     150
As           150     0

我想创建另一个字段来标记1,以便V2中每个ID大于0的第一次出现。即看到ID Ab,它只在第二个条目上标记为1。

如下所示:

Id         V1       V2          Indicator
Aa         10      1000           1
Aa          20       90              0
Ab          20       0                0
Ab          20       90.             1
Ab          20       100.           0
As          50       90.             1
As          300     150.           0
As           150     0.              0

1 个答案:

答案 0 :(得分:1)

library( "data.table" )
df6[, id := 1:.N] # create unique row id
df6[, indicator := 0 ]  # assign 0 for all rows of indicator column
# find id values that pass the criteria and assign 1 to indicator column
df6[ id %in% df6[ V2 > 0 & V2 < 10000, .SD[1], by = .(Id)][, id], indicator := -1]
df6[ id %in% df6[ V2 >= 10000, .SD[1], by = .(Id)][, id], indicator := 1]
df6[ , id := NULL ] # remove id column
df6
#    Id  V1      V2 indicator
# 1: Aa  10    1000        -1
# 2: Aa  20      90         0
# 3: Ab  20       0         0
# 4: Ab  20      90        -1
# 5: Ab  20     100         0
# 6: As  50      90        -1
# 7: As 300     150         0
# 8: As 150       0         0
# 9: Ap  10 1000000         1  

基于@Jaap的评论:

df6[, indicator := 0 ][
  df6[, .I[V2 > 0 & V2 < 10000][1], by = Id]$V1, indicator := -1][
    df6[, .I[V2 >= 10000][1], by = Id]$V1, indicator := 1]

数据:我添加了一个额外的行,其值大于10K

df6 <- fread('Id         V1       V2
    Aa         10      1000
             Aa          20       90
             Ab          20       0
             Ab          20       90
             Ab          20       100
             As          50       90
             As          300     150
             As           150     0
             Ap         10      1000000', header = TRUE)