Assuming I have data looks like below.
At this entire data, in total I have 3*A, 2*B, 2*C, and only 1 D, E, and F.
data <- read.table(textConnection("
col1 col2
A B
A C
B A
C D
E F
"), header = TRUE)
What I want to do is to keep the order and contents the same, BUT make them unique. For example, the A becomes A.1, A.2, and A.3.
col1 col2
A.1 B.2
A.2 C.2
B.1 A.3
C.1 D
E F
Is there any smart way I can do this?
I know I can use make.unique
or make.names
, but it looks like it only can work for one column, not for entire dataset.
答案 0 :(得分:5)
使用:
dat[] <- make.unique(as.character(unlist(dat)))
给出:
> dat col1 col2 1 A B.1 2 A.1 C.1 3 B A.2 4 C D 5 E F
答案 1 :(得分:4)
OP要求data.frame中的值应在所有列中唯一。这是一个强有力的指标,表明数据应该更好地从宽格式转换为长格式,其中所有数据操作都可以在一个列而不是多个列上执行。
library(data.table)
DT <- data.table(data)
molten <- melt(DT, measure.vars = names(DT))[
, value := paste(value, rowid(value), sep = ".")]
molten
variable value 1: col1 A.1 2: col1 A.2 3: col1 B.1 4: col1 C.1 5: col1 E.1 6: col2 B.2 7: col2 C.2 8: col2 A.3 9: col2 D.1 10: col2 F.1
rowid()
函数是一个便利函数,用于在每个组中生成唯一的行ID。
进一步处理可以以长格式继续进行。最后,数据可能会再次转换为宽格式:
molten[, rn := rowid(variable)][, dcast(.SD, rn ~ variable)][, rn := NULL][]
col1 col2 1: A.1 B.2 2: A.2 C.2 3: B.1 A.3 4: C.1 D.1 5: E.1 F.1
Jaap's make.unique()
approach 也可以使用:
melt(DT, measure.vars = names(DT))[, value := make.unique(value)][]
variable value 1: col1 A 2: col1 A.1 3: col1 B 4: col1 C 5: col1 E 6: col2 B.1 7: col2 C.1 8: col2 A.2 9: col2 D 10: col2 F
答案 2 :(得分:2)
一个选项是unlist
数据集,然后使用ave
获取序列,paste
使用未列出的向量并将其分配回原始数据集
v1 <- as.character(unlist(data))
data[] <- sub("\\.$", "", paste(v1, ave(v1, v1,
FUN = function(x) if(length(x)>1) seq_along(x) else ""), sep="."))
data
# col1 col2
#1 A.1 B.2
#2 A.2 C.2
#3 B.1 A.3
#4 C.1 D
#5 E F