以先前的行为条件删除数据框中的行

时间:2015-03-30 21:54:57

标签: r dataframe subset

  State Year APPT   mood ranney_4yrs folded_ranney_4yrs time censor
Arizona 1970    3 47.778   0.3299708          0.8299708   30      0
Arizona 1971    3 51.948   0.3265375          0.8265375   31      0
Arizona 1972    3 48.429   0.3246062          0.8246062   32      0
Arizona 1973    3 42.909   0.3226750          0.8226750   33      0
Arizona 1974    1 40.548   0.3683167          0.8683167   34      1
Arizona 1975    1 39.517   0.4139583          0.9139583   35      1
Arizona 1976    1 38.659   0.4543917          0.9543917   36      1
Arizona 1977    1 36.995   0.4948250          0.9948250   37      1

我有这个数据框,我想删除审查列中除了1的第一个实例之外的所有实例。我可以编写哪些代码来保留第一个实例并删除审查列中1的所有后续实例?

2 个答案:

答案 0 :(得分:0)

如果数据框由审查栏排序:

,这将执行此操作
df[df[,'censor']!=1 | !duplicated(df[,'censor']),]

答案 1 :(得分:0)

无论审查栏的顺序如何,此解决方案都将有效:

df[df$censor!=1 | ave(df$censor,df$censor,FUN=function(x) 1:length(x))==1,];

它通过导出"运行计数"每个不同的审查员价值。我以一种相当不寻常的方式使用ave()函数为每个唯一的审查者值评估表达式1:length(x),并且ave()函数执行必要的工作以映射每个结果"计数向量"返回到分组向量中发生的审查者值的顺序(即ave()的第二个参数)。 ave()的第一个参数的内容(但不是长度)完全不相关,因为表达式1:length(x)仅取决于组的长度,而不取决于其内容。 (但是为第一个参数重用分组向量是有意义的,因为它保证具有正确的长度,即与分组向量相同的长度。)因此,ave()的返回值表示每个审查员值的运行计数,根据他们在审查员列中发生的顺序正确排序。然后可以在索引操作中使用运行计数来仅选择首先出现的行,即具有运行计数值1的行(至少对于检查器值1; |的LHS在所有其他检查器中选择值无论出现次数多少。)

这是一个演示,在那里我略微与审查栏混淆,以证明订单不可知论:

df <- data.frame(State=c('Arizona','Arizona','Arizona','Arizona','Arizona','Arizona','Arizona','Arizona'), Year=c(1970,1971,1972,1973,1974,1975,1976,1977), APPT=c(3,3,3,3,1,1,1,1), mood=c(47.778,51.948,48.429,42.909,40.548,39.517,38.659,36.995), ranney_4yrs=c(0.3299708,0.3265375,0.3246062,0.3226750,0.3683167,0.4139583,0.4543917,0.4948250), folded_ranney_4yrs=c(0.8299708,0.8265375,0.8246062,0.8226750,0.8683167,0.9139583,0.9543917,0.9948250), time=c(30,31,32,33,34,35,36,37), censor=c(1,0,1,0,0,1,0,1) );
df;
##     State Year APPT   mood ranney_4yrs folded_ranney_4yrs time censor
## 1 Arizona 1970    3 47.778   0.3299708          0.8299708   30      1
## 2 Arizona 1971    3 51.948   0.3265375          0.8265375   31      0
## 3 Arizona 1972    3 48.429   0.3246062          0.8246062   32      1
## 4 Arizona 1973    3 42.909   0.3226750          0.8226750   33      0
## 5 Arizona 1974    1 40.548   0.3683167          0.8683167   34      0
## 6 Arizona 1975    1 39.517   0.4139583          0.9139583   35      1
## 7 Arizona 1976    1 38.659   0.4543917          0.9543917   36      0
## 8 Arizona 1977    1 36.995   0.4948250          0.9948250   37      1
df[df$censor!=1 | ave(df$censor,df$censor,FUN=function(x) 1:length(x))==1,];
##     State Year APPT   mood ranney_4yrs folded_ranney_4yrs time censor
## 1 Arizona 1970    3 47.778   0.3299708          0.8299708   30      1
## 2 Arizona 1971    3 51.948   0.3265375          0.8265375   31      0
## 4 Arizona 1973    3 42.909   0.3226750          0.8226750   33      0
## 5 Arizona 1974    1 40.548   0.3683167          0.8683167   34      0
## 7 Arizona 1976    1 38.659   0.4543917          0.9543917   36      0