library(tidyverse)
使用底部的示例数据,我尝试删除ID列中的重复项,但只删除" Year"中的重复项。专栏等于2017年。
我尝试了下面的代码,但似乎没有效果。
DF <- DF %>%
group_by(ID) %>%
mutate(REMOVE = if_else(duplicated(ID) & Year == 2017, 1, 0))
DF <- DF %>%
group_by(ID) %>%
mutate(REMOVE = if_else(!unique(ID) & Year == 2017, 1, 0))
我尝试使用代码按&#34; ID&#34;进行分组,然后使用&#34; if_else&#34;在每组重复ID的代码年2017年的陈述中使用1.我将使用下面的过滤器代码删除所有1&#39;
DF <- DF %>%
filter(REMOVE == 1)
我不确定为什么这段代码不起作用。我也尝试过改变ID和Year的列类型,从字符,数字等,但这没有帮助。
帮助将不胜感激!
ID<-c(18998878,8888888,57485746,18998878,45454536,64536475,64536475,87966666,58675844,58695847,68574443,87966666)
Program<-c("A111","B488","T687","A111","G888","T444","T444","P867","R444","B323","F888","P867")
Code<-c(1222,4534,543,1222,4678,6544,6544,9898,8888,5656,6666,9898)
Year<-c(2016,2016,2017,2017,2017,2017,2016,2016,2016,2017,2017,2017)
DF<-data_frame(ID,Program,Code,Year)
答案 0 :(得分:2)
按DF
和ID
对Year
进行排序,然后使用distinct
仅保留Year = 2016
个值
library(dplyr)
ID <- c(18998878,8888888,57485746,18998878,45454536,64536475,64536475,87966666,
58675844,58695847,68574443,87966666)
Program <- c("A111","B488","T687","A111","G888","T444","T444","P867","R444","B323","F888","P867")
Code <- c(1222,4534,543,1222,4678,6544,6544,9898,8888,5656,6666,9898)
Year <- c(2016,2016,2017,2017,2017,2017,2016,2016,2016,2017,2017,2017)
DF <- data_frame(ID,Program,Code,Year)
DF
#> # A tibble: 12 x 4
#> ID Program Code Year
#> <dbl> <chr> <dbl> <dbl>
#> 1 18998878. A111 1222. 2016.
#> 2 8888888. B488 4534. 2016.
#> 3 57485746. T687 543. 2017.
#> 4 18998878. A111 1222. 2017.
#> 5 45454536. G888 4678. 2017.
#> 6 64536475. T444 6544. 2017.
#> 7 64536475. T444 6544. 2016.
#> 8 87966666. P867 9898. 2016.
#> 9 58675844. R444 8888. 2016.
#> 10 58695847. B323 5656. 2017.
#> 11 68574443. F888 6666. 2017.
#> 12 87966666. P867 9898. 2017.
DF %>%
arrange(ID, Year) %>%
distinct(ID, .keep_all = TRUE)
#> # A tibble: 9 x 4
#> ID Program Code Year
#> <dbl> <chr> <dbl> <dbl>
#> 1 8888888. B488 4534. 2016.
#> 2 18998878. A111 1222. 2016.
#> 3 45454536. G888 4678. 2017.
#> 4 57485746. T687 543. 2017.
#> 5 58675844. R444 8888. 2016.
#> 6 58695847. B323 5656. 2017.
#> 7 64536475. T444 6544. 2016.
#> 8 68574443. F888 6666. 2017.
#> 9 87966666. P867 9898. 2016.
由reprex package(v0.2.0)创建于2018-03-07。
答案 1 :(得分:0)
ID<-c(18998878,8888888,57485746,18998878,45454536,64536475,64536475,87966666,58675844,58695847,68574443,87966666)
Program<-c("A111","B488","T687","A111","G888","T444","T444","P867","R444","B323","F888","P867")
Code<-c(1222,4534,543,1222,4678,6544,6544,9898,8888,5656,6666,9898)
Year<-c(2016,2016,2017,2017,2017,2017,2016,2016,2016,2017,2017,2017)
DF<-data_frame(ID,Program,Code,Year)
filter(DF, (! duplicated(ID)) & Year == 2017)
如果年份是2017年,这将删除第二次或以后出现的任何ID。值得注意的是,没有任何示例,所以我可能误解了您的问题。
答案 2 :(得分:0)
您将其划分为两个数据框,一个年份等于2017年,另一个年份不等于2017年。
DF1 <- DF %>% filter(Year==2017)
DF2 <- DF %>% filter(Year!=2017)
然后使用distinct()通过ID列对DF1进行重复数据删除。 Keep_all是保留其余值。
DF3 <- DF1 %>% distinct(ID,.keep_all = T)
现在,您可以通过将DF2和DF3与rbind()
结合使用来获得最终结果 df_all <- rbind(DF2,DF3)