我在数据框中有重复的行。我需要删除重复的行值,并仅保留其中之一,但保持行不变。
尝试使用duplicated
,distinct
或unique
不允许保留行。
S.No Rate Proportion Control
C11 50 0.9 A
C11 50 0.9 B
C11 50 0.9 A
C21 40 0.8 B
C21 40 0.8 A
C21 40 0.8 A
S.No Rate Proportion Control
C11 A
C11 B
C11 50 0.9 A
C21 B
C21 A
C21 40 0.8 A
答案 0 :(得分:5)
尝试
df[duplicated(df[1:3], fromLast = TRUE),2:3] <- ''
df
# S.No Rate Proportion Control
#1 C11 A
#2 C11 B
#3 C11 50 0.9 A
#4 C21 B
#5 C21 A
#6 C21 40 0.8 A
dplyr
中的等价于
library(dplyr)
df %>%
mutate_at(vars(2:3), funs(replace(., duplicated(., fromLast = TRUE), '')))
答案 1 :(得分:2)
是的,条件不清楚,但是您可以尝试
library(dplyr)
df %>%
group_by(S.No) %>%
mutate_at(2:3, ~replace(., row_number() != n(),''))
#OR
#mutate_at(vars(Rate,Proportion), ~replace(., row_number() != n(),''))
# S.No Rate Proportion Control
# <chr> <chr> <chr> <chr>
#1 C11 "" "" A
#2 C11 "" "" B
#3 C11 50 0.9 A
#4 C21 "" "" B
#5 C21 "" "" A
#6 C21 40 0.8 A
这将替换列Rate
和Proportion
中每个条目的空值,但每个组的最后一行(S.No
)除外。
数据
df <- structure(list(S.No = c("C11", "C11", "C11", "C21", "C21", "C21"
), Rate = c(50L, 50L, 50L, 40L, 40L, 40L), Proportion = c(0.9,
0.9, 0.9, 0.8, 0.8, 0.8), Control = c("A", "B", "A", "B", "A",
"A")), class = "data.frame", row.names = c(NA, -6L))
答案 2 :(得分:1)
在dplyr
中执行此操作的另一种方法:
df %>% group_by(S.No) %>% mutate_at(2:3, .funs = funs( . = case_when(
n() != row_number() ~ "",
TRUE ~ as.character(.)
)))