条件删除行:删除准相同的行但不相同

时间:2016-10-07 03:42:55

标签: r dataframe string-matching delete-row

我有一个数据框,我需要根据两个值进行清算:#34;准相同"在行中。我只需删除不同但不相同的观察结果。我尝试使用agrep执行此操作,但此功能也会删除相同的观察结果。

Id<-c("RoLu1976","Rolu1976","RoLu1976","AlBl1989","ThSa1996")
Art<-c("Econometric Policy Evaluation: A Critique","Econometric Policy Evaluations A Critique","Econometric Policy Evaluation: A Critique", "Rules after discretion", "Expectations and the Nonneutrality of Lucas")
Id.1<-c("FiKy1989","FiKy1989","BeBe1983","JoSt1989","JoSt1990")
Art.1<-c("Notes on the Lucas Critique","Notes on the Lucas Critique","The Inconsistency of Optimal Plans","The Inconsistency","Notes on the Lucas")
N<-data.frame(Id,Art,Id.1,Art.1)

以上dataframe中的准相同值位于第一次观察的Art列中,仅对于s:而言不同。

在上述情况下,最终数据框应该是(请注意,相同的值不会被删除):

Id        Art                                          Id.1       Art.1
RoLu1976  Econometric Policy Evaluation: A Critique    FiKy1989   Notes on the Lucas Critique
RoLu1976  Econometric Policy Evaluation: A Critique    BeBe1983   The Inconsistency of Optimal Plans
AlBl1989  Rules after discretion                       JoSt1989   The Inconsistency
ThSa1996  Expectations and the Nonneutrality of Lucas  JoSt1990   Notes on the Lucas

我所做的是this

yy = NULL
for(i in 1:length(N$Art)){
  temp = agrep(N[i,"Art"],N$Art,value=T)
  y = ifelse(any(N[i,"Art"]==temp),temp[1],N[i,"Art"])
  yy = c(yy,y)
}
N$Art = yy
N.2 = N[!duplicated(N$Art), ]

但它删除了两个值:相同和准相同。

我该怎么做?

1 个答案:

答案 0 :(得分:3)

您可以存储原始Art列中相同内容的索引,并将其与重复数据删除后的结果结合使用,例如

originallyDuplicated <- duplicated(N$Art)
# then run your snippet to generate `yy`

所以你想摆脱那些重复的东西现在,而不是最初的

N[!(duplicated(yy) & !originallyDuplicated),]

虽然对我来说,似乎不是纯粹基于Art列的排除标准,但如果行中的每列重复排除行,则更有意义。表中的其他地方或几乎重复。 (例如,比较Art.1,Id.1,ID等列?)