Question

我有一个像这样的data.frame：

data.frame(matrix(c(11:13, 21:23, 11:13, 11:13, 31:33, 41:43, 31:33), byrow = TRUE, ncol = 3))

现在我想知道哪一行是哪一行的副本，返回一个复制的最低行号的索引向量。如果某行不是前一行的副本，则应获取下一个可用索引。在此示例中，输出应为：

c(1, 2, 1, 1, 3, 4, 3)

我可以通过遍历所有行对来实现这一点，但必须有一种有效的方法。

不幸的是，duplicated只显示哪些行是重复的，而不是它们完全重复的WHICH行。有没有可以帮助的功能？

Answer 1

这就是你要追求的吗？

# Your data
d <- data.frame(matrix(c(11:13, 21:23, 11:13, 11:13, 31:33, 41:43, 31:23), byrow = TRUE, ncol = 3))

# Indices of unique rows 
idx <- as.numeric(factor(apply(d, 1, paste, collapse = "_"), 
                         levels = unique(apply(d, 1, paste, collapse = "_"))));
print(idx);
[1] 1 2 1 1 3 4 5 6 7

Answer 2

作为替代方案，您可以使用group_indices中的dplyr：

dplyr::group_indices(df, X1, X2, X3)
# [1] 1 2 1 1 3 4 3

X1, X2和X3是数据框的列名。

Answer 3

在较新版本的R中使用grouping函数的另一种方法。

获取彼此相邻放置相同值的行的顺序：

grs = do.call(grouping, dat)

并操纵结果的“属性”以获得想要的结果：

ends = attr(grs, "ends")
rep(seq_along(ends), c(ends[1], diff(ends)))[order(grs)]
#[1] 1 2 1 1 3 4 3

Answer 4

另一个选项是来自.GRP

的data.table

library(data.table)
setDT(df1)[, grp := .GRP , .(X1, X2, X3)]$grp
#[1] 1 2 1 1 3 4 3

找到哪一行重复data.frame中的哪一行

4 个答案: