我有一个名字的载体:
> dput(vec_dup)
c("Mark", "Simon", "Marcus", "Greg", "Simon", "Greg", "Marta",
"Marta", "Tim", "Tim", "Greg", "Tom", "Tom", "Greg")
在此向量中重复了一些名称。我想添加每个字符串特定字符_1
,_2
,_3
。添加的数量取决于它出现在向量中以及之前出现的次数。
期望的输出:
vec_output <- c("Mark_1", "Simon_1", "Marcus_1", "Greg_1", "Simon_2", "Greg_2", "Marta_1",
"Marta_2", "Tim_1", "Tim_2", "Greg_3", "Tom_1", "Tom_2", "Greg_4")
如您所见,不仅仅是重复的字符串,因为Marcus
在字符串中只出现一次,但仍然应该_1
。如何有效地成千上万的字符串?
答案 0 :(得分:1)
我使用table
来获取每个名称的出现次数,将其存储在data.frame
中,然后match
将所需的列存储在原始向量中:
nam <- c("Mark", "Simon", "Marcus", "Greg", "Simon", "Greg", "Marta",
"Marta", "Tim", "Tim", "Greg", "Tom", "Tom", "Greg")
occ <- data.frame("name" = names(table(nam)),
"occ" = as.numeric(table(nam)))
occ$res <- paste(occ$name, occ$occ, sep = "_")
res <- occ[match(nam, occ$name), "res"]
答案 1 :(得分:1)
根据您的要求,您可以使用ave
按相同的字词分组,并根据每个组的顺序粘贴后缀,即
ave(vec_dup, vec_dup, FUN = function(i) paste0(i, '_', seq_along(i)))
#[1] "Mark_1" "Simon_1" "Marcus_1" "Greg_1" "Simon_2" "Greg_2" "Marta_1" "Marta_2" "Tim_1" "Tim_2" "Greg_3" "Tom_1" "Tom_2"
#[14] "Greg_4"
如果您不关心为所有人添加后缀而只是区分欺骗,那么只需make.unique
就足够了,即
make.unique(vec_dup, sep = '_')
#[1] "Mark" "Simon" "Marcus" "Greg" "Simon_1" "Greg_1" "Marta" "Marta_1" "Tim" "Tim_1" "Greg_2" "Tom" "Tom_1" "Greg_3"
答案 2 :(得分:0)
与make.unique
一样允许不带索引的唯一值,并以_1开头重复项
string<-c("Mark", "Simon", "Marcus", "Greg", "Simon", "Greg", "Marta",
"Marta", "Tim", "Tim", "Greg", "Tom", "Tom", "Greg")
mstring <- make.unique(as.character(string), sep="_" )
tmp <- !duplicated(string)
for (i in 1:length(mstring[tmp])){
mstring[tmp][i]<-ifelse(string[tmp][i] %in% string[duplicated(string)], gsub("(.*)","\\1_0", mstring[tmp][i]),
mstring[tmp][i]
)
}
end <- sub(".*_([0-9]+)","\\1",grep("_([0-9]*)$",mstring,value=T) )
beg <- sub("(.*_)[0-9]+","\\1",grep("_([0-9]*)$",mstring,value=T) )
newend <- as.numeric(end)+1
mstring[grep("_([0-9]*)$",mstring)] <- paste0(beg,newend)
mstring
# "Mark" "Simon_1" "Marcus" "Greg_1" "Simon_2" "Greg_2" "Marta_1" "Marta_2" "Tim_1" "Tim_2" "Greg_3" "Tom_1" "Tom_2" "Greg_4"
答案 3 :(得分:0)
使用data.table::rowid()
:
library(data.table)
paste(vec_dup, rowid(vec_dup), sep = "_")
# [1] "Mark_1" "Simon_1" "Marcus_1" "Greg_1" "Simon_2" "Greg_2"
# [7] "Marta_1" "Marta_2" "Tim_1" "Tim_2" "Greg_3" "Tom_1"
# [13] "Tom_2" "Greg_4"