我有一个需要重复记录的记录列表,这些记录看起来像是同一组的组合,但使用常规函数来重复删除记录不起作用,因为这两列不是重复的。以下是一个可重复的例子。
df <- data.frame( A = c("2","2","2","43","43","43","331","391","481","490","501","501","501","502","502","502"),
B = c("43","501","502","2","501","502","491","496","490","481","2","43","502","2","43","501"))
以下是我正在寻找的所需输出。
df_Final <- data.frame( A = c("2","2","2","331","391","481"),
B = c("43","501","502","491","496","490"))
答案 0 :(得分:1)
我想这是想要找到列A
中的元素首次出现在B
列
idx = match(df$A, df$B)
如果A
中的元素不在B
(is.na(idx)
)或A
中的元素在B
中首次出现之前,请保留该行(seq_along(idx) < idx
)
df[is.na(idx) | seq_along(idx) < idx,]
对于这个或多或少的文字整数方法可能是创建然后删除一个临时列
library(tidyverse)
df %>% mutate(idx = match(A, B)) %>%
filter(is.na(idx) | seq_along(idx) < idx) %>%
select(-idx)
答案 1 :(得分:0)
您可以删除所有在
重新排序下重复的行require(dplyr)
df %>%
apply(1, sort) %>% t %>%
data.frame %>%
group_by_all %>%
slice(1)