比较具有重复值的行(在指定列中),然后根据由其他两个指定列中的值确定的条件删除行

时间:2017-10-04 03:42:18

标签: r

我在R中有一个类似于我在下面创建的数据框(用于说明)。对于具有重复ID的帐户(在下面的示例中,ID是名称,但也可以是数字)我想编写一些代码,删除重复ID条目中的Closed值与Opened值匹配的那些行。例如,下面的前三行是属于John的3个不同的帐户(&#34的重复ID; John"在ID列中)。前两个(三个中的一组)都在09/30/2017关闭(与第三个的Opened值匹配),因此它们应该从输出数据框中删除。玛丽也是如此(她的两个帐户中的一个帐户的关闭日期与另一个帐户的开立日期相匹配,因此应该删除已关闭的日期)。但是,对于Jack和Pete,它们各自的帐户都应保存在输出数据框中,因为(在每种情况下),关闭日期与打开日期不匹配。没有任何重复ID的所有行(例如Jill,Jane,Alice)也保存在输出数据框中。

我有以下代码使用dplyr按重复ID进行过滤。

Input_DF_Dupl_ID <- Input_DF %>% 
  group_by(ID) %>% 
  filter(n() > 1) %>%
  arrange(ID)

但是,它只识别和排列重复的帐户 - 它不会继续删除符合上述条件的帐户。此外,我实际上并不想删除(过滤掉)非重复的帐户。

我希望这很清楚,感谢我能得到的所有帮助。,提前谢谢......

Input_DF:

Date       ID    Opened     Closed     Review      Status  Type Paid
09/30/2017 John  09/21/2016 09/30/2017 09/30/2019  Closed  A    1000
09/30/2017 John  06/19/2015 09/30/2017 06/30/2020  Closed  A    2500
09/30/2017 John  09/30/2017            14/31/2022  Open    A    0
09/30/2017 Jill  11/10/2014            07/31/2018  Open    B    0
09/30/2017 Jane  07/15/2012 09/30/2017 07/31/2017  Closed  C    10999
09/30/2017 Alice 06/19/2015 09/30/2017 06/30/2020  Closed  A    500
09/30/2017 Mary  11/10/2014 09/30/2017 07/31/2018  Closed  B    12000
09/30/2017 Mary  09/30/2017            07/31/2022  Open    B    0
09/30/2017 Jack  06/19/2011 09/30/2017 06/30/2020  Closed  A    500
09/30/2017 Jack  03/19/2015            06/30/2020  Open    A    0
09/30/2017 Pete  07/15/2012 05/31/2015 07/31/2017  Closed  B    0
09/30/2017 Pete  12/22/2016            07/31/2017  Open    C    0

Desired Output_DF:

Date       ID    Opened     Closed     Review      Status  Type Paid
09/30/2017 John  09/30/2017            14/31/2022  Open    A    0
09/30/2017 Jill  11/10/2014            07/31/2018  Open    B    0
09/30/2017 Jane  07/15/2012 09/30/2017 07/31/2017  Closed  C    10999
09/30/2017 Alice 06/19/2015 09/30/2017 06/30/2020  Closed  A    500
09/30/2017 Mary  09/30/2017            07/31/2022  Open    B    0
09/30/2017 Jack  06/19/2011 09/30/2017 06/30/2020  Closed  A    500
09/30/2017 Jack  03/19/2015            06/30/2020  Open    A    0
09/30/2017 Pete  07/15/2012 05/31/2015 07/31/2017  Closed  B    0
09/30/2017 Pete  12/22/2016            07/31/2017  Open    C    0

1 个答案:

答案 0 :(得分:0)

请使用以下代码。编辑仅将条件应用于大小超过一个记录的组

library(dplyr)
Input_DF_Dupl_ID <- Input_DF %>% 
  group_by(ID) %>% 
  filter(!(Status == "Closed" & Closed %in% Open & n()>1)) %>%
  arrange(ID)