Make all elements unique in a dataframe

时间:2017-07-12 08:09:47

标签: r

Assuming I have data looks like below.

At this entire data, in total I have 3*A, 2*B, 2*C, and only 1 D, E, and F.

data <- read.table(textConnection("
col1 col2 
A B
A C
B A
C D
E F
"), header = TRUE)

What I want to do is to keep the order and contents the same, BUT make them unique. For example, the A becomes A.1, A.2, and A.3.

col1 col2 
A.1 B.2
A.2 C.2
B.1 A.3
C.1 D
E F

Is there any smart way I can do this?

I know I can use make.unique or make.names, but it looks like it only can work for one column, not for entire dataset.

3 个答案:

答案 0 :(得分:5)

使用:

dat[] <- make.unique(as.character(unlist(dat)))

给出:

> dat
  col1 col2
1    A  B.1
2  A.1  C.1
3    B  A.2
4    C    D
5    E    F

答案 1 :(得分:4)

OP要求data.frame中的值应在所有列中唯一。这是一个强有力的指标,表明数据应该更好地从宽格式转换为长格式,其中所有数据操作都可以在一个列而不是多个列上执行。

library(data.table)
DT <- data.table(data)
molten <- melt(DT, measure.vars = names(DT))[
  , value := paste(value, rowid(value), sep = ".")]
molten
    variable value
 1:     col1   A.1
 2:     col1   A.2
 3:     col1   B.1
 4:     col1   C.1
 5:     col1   E.1
 6:     col2   B.2
 7:     col2   C.2
 8:     col2   A.3
 9:     col2   D.1
10:     col2   F.1

rowid()函数是一个便利函数,用于在每个组中生成唯一的行ID。

进一步处理可以以长格式继续进行。最后,数据可能会再次转换为宽格式:

molten[, rn := rowid(variable)][, dcast(.SD, rn ~ variable)][, rn := NULL][]
   col1 col2
1:  A.1  B.2
2:  A.2  C.2
3:  B.1  A.3
4:  C.1  D.1
5:  E.1  F.1

Jaap's make.unique() approach 也可以使用:

melt(DT, measure.vars = names(DT))[, value := make.unique(value)][]
    variable value
 1:     col1     A
 2:     col1   A.1
 3:     col1     B
 4:     col1     C
 5:     col1     E
 6:     col2   B.1
 7:     col2   C.1
 8:     col2   A.2
 9:     col2     D
10:     col2     F

答案 2 :(得分:2)

一个选项是unlist数据集,然后使用ave获取序列,paste使用未列出的向量并将其分配回原始数据集

v1 <- as.character(unlist(data))
data[] <- sub("\\.$", "", paste(v1, ave(v1, v1,
         FUN = function(x) if(length(x)>1) seq_along(x) else ""), sep="."))
data
#  col1 col2
#1  A.1  B.2
#2  A.2  C.2
#3  B.1  A.3
#4  C.1    D
#5    E    F