我的数据= data.lab
data.lab <- data.frame(Name=c("A","e","b","c","d"),
bp =c( 12,12,11,12,11),
sugar = c(19,21,23,19,23))
我只想在参考文献中重复姓名
所需的输出
lab.data <- data.frame(Name=c("A","b","c","d"),
bp =c( 12,11,12,11),
sugar = c(19,23,19,23),
pair=c(1,1,2,2))
dub.data <- duplicated(data.lab) | duplicated(data.lab, fromLast = TRUE)
out.1=data.lab[dub.data, ]
这给出了重复的数据,但是我需要一列,因为重复对是什么
答案 0 :(得分:2)
使用dplyr
,您可以执行以下操作:
data.lab %>%
group_by(bp, sugar) %>%
filter(n() == 2) %>%
mutate(pair = seq_along(Name))
Name bp sugar pair
<fct> <dbl> <dbl> <int>
1 A 12 19 1
2 b 11 23 1
3 c 12 19 2
4 d 11 23 2
或者:
data.lab %>%
group_by(bp, sugar) %>%
filter(n() == 2) %>%
mutate(pair = row_number())
或者如果重复项可以多于两对:
data.lab %>%
group_by(bp, sugar) %>%
filter(n() > 1) %>%
mutate(pair = seq_along(Name))
或者:
data.lab %>%
group_by(bp, sugar) %>%
filter(n() > 1) %>%
mutate(pair = row_number())
或按“名称”以外的所有变量分组:
data.lab %>%
group_by_at(vars(-matches("(Name)"))) %>%
filter(n() > 1) %>%
mutate(pair = seq_along(Name))
或者:
data.lab %>%
group_by_at(vars(-matches("(Name)"))) %>%
filter(n() > 1) %>%
mutate(pair = row_number())
答案 1 :(得分:1)
从您的方法继续,我们可以在基础R中使用ave
dat1 <- data.lab[duplicated(data.lab[c("bp", "sugar")]) |
duplicated(data.lab[c("bp", "sugar")], fromLast = TRUE) , ]
dat1$pair <- with(dat1, ave(Name, bp, sugar, FUN = seq_along))
dat1
# Name bp sugar pair
#1 A 12 19 1
#2 b 11 23 1
#3 c 12 19 2
#4 d 11 23 2