删除重复的行值并保留行

时间:2019-09-23 13:17:19

标签: r dplyr

我在数据框中有重复的行。我需要删除重复的行值,并仅保留其中之一,但保持行不变。

尝试使用duplicateddistinctunique不允许保留行。

S.No   Rate   Proportion Control
C11    50     0.9         A
C11    50     0.9         B
C11    50     0.9         A
C21    40     0.8         B
C21    40     0.8         A
C21    40     0.8         A
S.No   Rate   Proportion Control
C11                       A
C11                       B
C11    50     0.9         A
C21                       B
C21                       A
C21    40     0.8         A

3 个答案:

答案 0 :(得分:5)

尝试

df[duplicated(df[1:3], fromLast = TRUE),2:3] <- ''

df
#  S.No Rate Proportion Control
#1  C11                       A
#2  C11                       B
#3  C11   50        0.9       A
#4  C21                       B
#5  C21                       A
#6  C21   40        0.8       A

dplyr中的等价于

library(dplyr)

df %>% 
  mutate_at(vars(2:3), funs(replace(., duplicated(., fromLast = TRUE), '')))

答案 1 :(得分:2)

是的,条件不清楚,但是您可以尝试

library(dplyr)

df %>%
  group_by(S.No) %>%
  mutate_at(2:3, ~replace(., row_number() != n(),''))
  #OR
  #mutate_at(vars(Rate,Proportion), ~replace(., row_number() != n(),''))

#  S.No  Rate  Proportion Control
# <chr> <chr> <chr>      <chr>  
#1 C11   ""    ""         A      
#2 C11   ""    ""         B      
#3 C11   50    0.9        A      
#4 C21   ""    ""         B      
#5 C21   ""    ""         A      
#6 C21   40    0.8        A  

这将替换列RateProportion中每个条目的空值,但每个组的最后一行(S.No)除外。

数据

df <- structure(list(S.No = c("C11", "C11", "C11", "C21", "C21", "C21"
), Rate = c(50L, 50L, 50L, 40L, 40L, 40L), Proportion = c(0.9, 
0.9, 0.9, 0.8, 0.8, 0.8), Control = c("A", "B", "A", "B", "A", 
"A")), class = "data.frame", row.names = c(NA, -6L))

答案 2 :(得分:1)

dplyr中执行此操作的另一种方法:

df %>% group_by(S.No) %>% mutate_at(2:3, .funs = funs( . = case_when(
  n() != row_number() ~  "",
  TRUE ~ as.character(.)
)))