如何通过unique(df)进行减少来跟踪数据框中的重复行?

时间:2017-07-12 21:03:35

标签: r

这是this question的后续问题。

想象一下以下数据框:

a <- c(rep("A", 3), rep("B", 3), rep("A",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)

给出了

  a b
1 A 1
2 A 1
3 A 2
4 B 4
5 B 1
6 B 1
7 A 2
8 A 2

我将它减少到它的唯一行:

df_unique <- unique(df)

现在,我想知道如何跟踪合并的行。我想创建一个新列,其中每个组件都有一个已合并的行名列表。如下所示:

df_unique_informative =   
  a b track
1 A 1 [1,2]
3 A 2 [3,7,8]
4 B 4 [4]
5 B 1 [5,6]

2 个答案:

答案 0 :(得分:4)

res = aggregate(x = list(track = 1:NROW(df)), by = list(a = df$a, b = df$b), function(x) x)
# OR perhaps you want
#res = aggregate(x = list(track = 1:NROW(df)), by = list(a = df$a, b = df$b), function(x)
#                                                                paste(x, collapse = ", "))
res
#  a b   track
#1 A 1    1, 2
#2 B 1    5, 6
#3 A 2 3, 7, 8
#4 B 4       4

#Shorter code
res = aggregate(list(track = 1:NROW(df)), df[,1:2], '[')

<强>更新

a <- c(rep("A", 3), rep("B", 3), rep("A",2))
b <- c(1,1,2,4,1,1,2,2)
c = letters[1:8]
df <-data.frame(a,b,c, stringsAsFactors = FALSE)
res = aggregate(x = list(track = 1:NROW(df)), by = list(a = df$a, b = df$b), function(x) df$c[x])
res
#  a b   track
#1 A 1    a, b
#2 B 1    e, f
#3 A 2 c, g, h
#4 B 4       d

答案 1 :(得分:0)

以下是tidyverse

的一个选项
library(tidyverse)
rownames_to_column(df, 'rn') %>% 
         group_by(a, b) %>% 
         summarise(track = list(rn))