State Year APPT mood ranney_4yrs folded_ranney_4yrs time censor
Arizona 1970 3 47.778 0.3299708 0.8299708 30 0
Arizona 1971 3 51.948 0.3265375 0.8265375 31 0
Arizona 1972 3 48.429 0.3246062 0.8246062 32 0
Arizona 1973 3 42.909 0.3226750 0.8226750 33 0
Arizona 1974 1 40.548 0.3683167 0.8683167 34 1
Arizona 1975 1 39.517 0.4139583 0.9139583 35 1
Arizona 1976 1 38.659 0.4543917 0.9543917 36 1
Arizona 1977 1 36.995 0.4948250 0.9948250 37 1
我有这个数据框,我想删除审查列中除了1的第一个实例之外的所有实例。我可以编写哪些代码来保留第一个实例并删除审查列中1的所有后续实例?
答案 0 :(得分:0)
如果数据框由审查栏排序:
,这将执行此操作df[df[,'censor']!=1 | !duplicated(df[,'censor']),]
答案 1 :(得分:0)
无论审查栏的顺序如何,此解决方案都将有效:
df[df$censor!=1 | ave(df$censor,df$censor,FUN=function(x) 1:length(x))==1,];
它通过导出"运行计数"每个不同的审查员价值。我以一种相当不寻常的方式使用ave()
函数为每个唯一的审查者值评估表达式1:length(x)
,并且ave()
函数执行必要的工作以映射每个结果"计数向量"返回到分组向量中发生的审查者值的顺序(即ave()
的第二个参数)。 ave()
的第一个参数的内容(但不是长度)完全不相关,因为表达式1:length(x)
仅取决于组的长度,而不取决于其内容。 (但是为第一个参数重用分组向量是有意义的,因为它保证具有正确的长度,即与分组向量相同的长度。)因此,ave()
的返回值表示每个审查员值的运行计数,根据他们在审查员列中发生的顺序正确排序。然后可以在索引操作中使用运行计数来仅选择首先出现的行,即具有运行计数值1的行(至少对于检查器值1; |
的LHS在所有其他检查器中选择值无论出现次数多少。)
这是一个演示,在那里我略微与审查栏混淆,以证明订单不可知论:
df <- data.frame(State=c('Arizona','Arizona','Arizona','Arizona','Arizona','Arizona','Arizona','Arizona'), Year=c(1970,1971,1972,1973,1974,1975,1976,1977), APPT=c(3,3,3,3,1,1,1,1), mood=c(47.778,51.948,48.429,42.909,40.548,39.517,38.659,36.995), ranney_4yrs=c(0.3299708,0.3265375,0.3246062,0.3226750,0.3683167,0.4139583,0.4543917,0.4948250), folded_ranney_4yrs=c(0.8299708,0.8265375,0.8246062,0.8226750,0.8683167,0.9139583,0.9543917,0.9948250), time=c(30,31,32,33,34,35,36,37), censor=c(1,0,1,0,0,1,0,1) );
df;
## State Year APPT mood ranney_4yrs folded_ranney_4yrs time censor
## 1 Arizona 1970 3 47.778 0.3299708 0.8299708 30 1
## 2 Arizona 1971 3 51.948 0.3265375 0.8265375 31 0
## 3 Arizona 1972 3 48.429 0.3246062 0.8246062 32 1
## 4 Arizona 1973 3 42.909 0.3226750 0.8226750 33 0
## 5 Arizona 1974 1 40.548 0.3683167 0.8683167 34 0
## 6 Arizona 1975 1 39.517 0.4139583 0.9139583 35 1
## 7 Arizona 1976 1 38.659 0.4543917 0.9543917 36 0
## 8 Arizona 1977 1 36.995 0.4948250 0.9948250 37 1
df[df$censor!=1 | ave(df$censor,df$censor,FUN=function(x) 1:length(x))==1,];
## State Year APPT mood ranney_4yrs folded_ranney_4yrs time censor
## 1 Arizona 1970 3 47.778 0.3299708 0.8299708 30 1
## 2 Arizona 1971 3 51.948 0.3265375 0.8265375 31 0
## 4 Arizona 1973 3 42.909 0.3226750 0.8226750 33 0
## 5 Arizona 1974 1 40.548 0.3683167 0.8683167 34 0
## 7 Arizona 1976 1 38.659 0.4543917 0.9543917 36 0