将特定字符添加到重复的字符串

时间:2018-02-16 10:42:19

标签: r

我有一个名字的载体:

> dput(vec_dup)
c("Mark", "Simon", "Marcus", "Greg", "Simon", "Greg", "Marta", 
"Marta", "Tim", "Tim", "Greg", "Tom", "Tom", "Greg")

在此向量中重复了一些名称。我想添加每个字符串特定字符_1_2_3。添加的数量取决于它出现在向量中以及之前出现的次数。

期望的输出:

vec_output <- c("Mark_1", "Simon_1", "Marcus_1", "Greg_1", "Simon_2", "Greg_2", "Marta_1", 
                "Marta_2", "Tim_1", "Tim_2", "Greg_3", "Tom_1", "Tom_2", "Greg_4")

如您所见,不仅仅是重复的字符串,因为Marcus在字符串中只出现一次,但仍然应该_1。如何有效地成千上万的字符串?

4 个答案:

答案 0 :(得分:1)

我使用table来获取每个名称的出现次数,将其存储在data.frame中,然后match将所需的列存储在原始向量中:

nam <- c("Mark", "Simon", "Marcus", "Greg", "Simon", "Greg", "Marta", 
        "Marta", "Tim", "Tim", "Greg", "Tom", "Tom", "Greg")
occ <- data.frame("name" = names(table(nam)),
                  "occ" = as.numeric(table(nam)))

occ$res <- paste(occ$name, occ$occ, sep = "_")

res <- occ[match(nam, occ$name), "res"]

答案 1 :(得分:1)

根据您的要求,您可以使用ave按相同的字词分组,并根据每个组的顺序粘贴后缀,即

ave(vec_dup, vec_dup, FUN = function(i) paste0(i, '_', seq_along(i)))
#[1] "Mark_1"   "Simon_1"  "Marcus_1" "Greg_1"   "Simon_2"  "Greg_2"   "Marta_1"  "Marta_2"  "Tim_1"    "Tim_2"    "Greg_3"   "Tom_1"    "Tom_2"   
#[14] "Greg_4"

如果您不关心为所有人添加后缀而只是区分欺骗,那么只需make.unique就足够了,即

make.unique(vec_dup, sep = '_')
#[1] "Mark"    "Simon"   "Marcus"  "Greg"    "Simon_1" "Greg_1"  "Marta"   "Marta_1" "Tim"     "Tim_1"   "Greg_2"  "Tom"     "Tom_1"   "Greg_3"

答案 2 :(得分:0)

make.unique一样允许不带索引的唯一值,并以_1开头重复项

string<-c("Mark", "Simon", "Marcus", "Greg", "Simon", "Greg", "Marta", 
  "Marta", "Tim", "Tim", "Greg", "Tom", "Tom", "Greg")
mstring <- make.unique(as.character(string), sep="_" )
tmp <- !duplicated(string)
for (i in 1:length(mstring[tmp])){
    mstring[tmp][i]<-ifelse(string[tmp][i] %in% string[duplicated(string)], gsub("(.*)","\\1_0", mstring[tmp][i]),
                            mstring[tmp][i]
    )
}
end <- sub(".*_([0-9]+)","\\1",grep("_([0-9]*)$",mstring,value=T) )
beg <- sub("(.*_)[0-9]+","\\1",grep("_([0-9]*)$",mstring,value=T) )
newend <- as.numeric(end)+1
mstring[grep("_([0-9]*)$",mstring)] <- paste0(beg,newend)
mstring 
# "Mark"    "Simon_1" "Marcus"  "Greg_1"  "Simon_2" "Greg_2"  "Marta_1" "Marta_2" "Tim_1"   "Tim_2"   "Greg_3"  "Tom_1"   "Tom_2"   "Greg_4" 

答案 3 :(得分:0)

使用data.table::rowid()

library(data.table)
paste(vec_dup, rowid(vec_dup), sep = "_")
#  [1] "Mark_1"   "Simon_1"  "Marcus_1" "Greg_1"   "Simon_2"  "Greg_2"  
#  [7] "Marta_1"  "Marta_2"  "Tim_1"    "Tim_2"    "Greg_3"   "Tom_1"   
# [13] "Tom_2"    "Greg_4"