大家好,只有当列高于阈值时,我才需要帮助才能从 df 中删除重复的行。
这是一个数据框:
Group Species Values
1 G1 Cattus_cattus 10
2 G1 Cattus_cattus 10
3 G1 Cattus_cattus 10
4 G2 Canis_lupus 2
5 G2 Canis_lupus 2
6 G3 Griseus_lupa 90
7 G4 Griseus_lupa 89
我会在 c(Group,Species)
时删除重复的 Values>5
在这里我应该得到:
Group Species Values
1 G1 Cattus_cattus 10
4 G2 Canis_lupus 2
5 G2 Canis_lupus 2
6 G3 Griseus_lupa 90
7 G4 Griseus_lupa 89
数据
structure(list(Group = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L
), .Label = c("G1", "G2", "G3", "G4"), class = "factor"), Species = structure(c(2L,
2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Canis_lupus", "Cattus_cattus",
"Griseus_lupa"), class = "factor"), Values = c(10L, 10L, 10L,
2L, 2L, 90L, 89L)), class = "data.frame", row.names = c(NA, -7L
))
答案 0 :(得分:3)
您可以使用 duplicated
并将其与 或 |
测试结合使用 x$Values < 5
。
x[!duplicated(x) | x$Values <= 5,]
#x[!(duplicated(x) & x$Values > 5),] #Alternative
# Group Species Values
#1 G1 Cattus_cattus 10
#4 G2 Canis_lupus 2
#5 G2 Canis_lupus 2
#6 G3 Griseus_lupa 90
#7 G4 Griseus_lupa 89
或仅用于组和物种:
x[!(duplicated(x[c("Group","Species")]) & x$Values > 5),]
答案 1 :(得分:2)
使用 dplyr
library(dplyr)
x %>%
filter(!duplicated(x)| Values <=5)
答案 2 :(得分:2)
library(dplyr)
df %>%
group_by(Group, Species) %>%
slice(if(any(Values > 5)) 1 else 1:n())
# output:
# Groups: Group, Species [4]
Group Species Values
<fct> <fct> <int>
1 G1 Cattus_cattus 10
2 G2 Canis_lupus 2
3 G2 Canis_lupus 2
4 G3 Griseus_lupa 90
5 G4 Griseus_lupa 89