我有以下代理数据,我正在使用数据表。
Id V1 V2
Aa 10 1000
Aa 20 90
Ab 20 90
Ab 20 90
Ab 20 100
As 50 90
As 300 150
As 150 0
我想创建另一个字段来标记1,以便V2中每个ID大于0的第一次出现。即看到ID Ab,它只在第二个条目上标记为1。
如下所示:
Id V1 V2 Indicator
Aa 10 1000 1
Aa 20 90 0
Ab 20 0 0
Ab 20 90. 1
Ab 20 100. 0
As 50 90. 1
As 300 150. 0
As 150 0. 0
答案 0 :(得分:1)
library( "data.table" )
df6[, id := 1:.N] # create unique row id
df6[, indicator := 0 ] # assign 0 for all rows of indicator column
# find id values that pass the criteria and assign 1 to indicator column
df6[ id %in% df6[ V2 > 0 & V2 < 10000, .SD[1], by = .(Id)][, id], indicator := -1]
df6[ id %in% df6[ V2 >= 10000, .SD[1], by = .(Id)][, id], indicator := 1]
df6[ , id := NULL ] # remove id column
df6
# Id V1 V2 indicator
# 1: Aa 10 1000 -1
# 2: Aa 20 90 0
# 3: Ab 20 0 0
# 4: Ab 20 90 -1
# 5: Ab 20 100 0
# 6: As 50 90 -1
# 7: As 300 150 0
# 8: As 150 0 0
# 9: Ap 10 1000000 1
基于@Jaap的评论:
df6[, indicator := 0 ][
df6[, .I[V2 > 0 & V2 < 10000][1], by = Id]$V1, indicator := -1][
df6[, .I[V2 >= 10000][1], by = Id]$V1, indicator := 1]
数据:我添加了一个额外的行,其值大于10K
df6 <- fread('Id V1 V2
Aa 10 1000
Aa 20 90
Ab 20 0
Ab 20 90
Ab 20 100
As 50 90
As 300 150
As 150 0
Ap 10 1000000', header = TRUE)