我在R中有一个类似于我在下面创建的数据框(用于说明)。对于具有重复ID的帐户(在下面的示例中,ID是名称,但也可以是数字)我想编写一些代码,删除重复ID条目中的Closed值与Opened值匹配的那些行。例如,下面的前三行是属于John的3个不同的帐户(&#34的重复ID; John"在ID列中)。前两个(三个中的一组)都在09/30/2017关闭(与第三个的Opened值匹配),因此它们应该从输出数据框中删除。玛丽也是如此(她的两个帐户中的一个帐户的关闭日期与另一个帐户的开立日期相匹配,因此应该删除已关闭的日期)。但是,对于Jack和Pete,它们各自的帐户都应保存在输出数据框中,因为(在每种情况下),关闭日期与打开日期不匹配。没有任何重复ID的所有行(例如Jill,Jane,Alice)也保存在输出数据框中。
我有以下代码使用dplyr按重复ID进行过滤。
Input_DF_Dupl_ID <- Input_DF %>%
group_by(ID) %>%
filter(n() > 1) %>%
arrange(ID)
但是,它只识别和排列重复的帐户 - 它不会继续删除符合上述条件的帐户。此外,我实际上并不想删除(过滤掉)非重复的帐户。
我希望这很清楚,感谢我能得到的所有帮助。,提前谢谢......
Input_DF:
Date ID Opened Closed Review Status Type Paid
09/30/2017 John 09/21/2016 09/30/2017 09/30/2019 Closed A 1000
09/30/2017 John 06/19/2015 09/30/2017 06/30/2020 Closed A 2500
09/30/2017 John 09/30/2017 14/31/2022 Open A 0
09/30/2017 Jill 11/10/2014 07/31/2018 Open B 0
09/30/2017 Jane 07/15/2012 09/30/2017 07/31/2017 Closed C 10999
09/30/2017 Alice 06/19/2015 09/30/2017 06/30/2020 Closed A 500
09/30/2017 Mary 11/10/2014 09/30/2017 07/31/2018 Closed B 12000
09/30/2017 Mary 09/30/2017 07/31/2022 Open B 0
09/30/2017 Jack 06/19/2011 09/30/2017 06/30/2020 Closed A 500
09/30/2017 Jack 03/19/2015 06/30/2020 Open A 0
09/30/2017 Pete 07/15/2012 05/31/2015 07/31/2017 Closed B 0
09/30/2017 Pete 12/22/2016 07/31/2017 Open C 0
Desired Output_DF:
Date ID Opened Closed Review Status Type Paid
09/30/2017 John 09/30/2017 14/31/2022 Open A 0
09/30/2017 Jill 11/10/2014 07/31/2018 Open B 0
09/30/2017 Jane 07/15/2012 09/30/2017 07/31/2017 Closed C 10999
09/30/2017 Alice 06/19/2015 09/30/2017 06/30/2020 Closed A 500
09/30/2017 Mary 09/30/2017 07/31/2022 Open B 0
09/30/2017 Jack 06/19/2011 09/30/2017 06/30/2020 Closed A 500
09/30/2017 Jack 03/19/2015 06/30/2020 Open A 0
09/30/2017 Pete 07/15/2012 05/31/2015 07/31/2017 Closed B 0
09/30/2017 Pete 12/22/2016 07/31/2017 Open C 0
答案 0 :(得分:0)
请使用以下代码。编辑仅将条件应用于大小超过一个记录的组
library(dplyr)
Input_DF_Dupl_ID <- Input_DF %>%
group_by(ID) %>%
filter(!(Status == "Closed" & Closed %in% Open & n()>1)) %>%
arrange(ID)