我想添加一个数字,指示向量中单词的第x ^个出现。 (因此,这个问题与Make a column with duplicated values unique in a dataframe不同,因为我有一个简单的向量,并尝试避免将其强制转换为data.frame的开销。)
例如对于矢量:
book, ship, umbrella, book, ship, ship
输出为:
book, ship, umbrella, book2, ship2, ship3
我自己解决了这个问题,方法是将向量转置到数据帧,然后使用分组功能。感觉就像用大锤砸开坚果一样:
# add consecutive number for equal string
words <- c("book", "ship", "umbrella", "book", "ship", "ship")
# transpose word vector to data.frame for grouping
df <- data.frame(words = words)
df <- df %>% group_by(words) %>% mutate(seqN = row_number())
# combine columns and remove '1' for first occurrence
wordsVec <- paste0(df$words, df$seqN)
gsub("1", "", wordsVec)
# [1] "book" "ship" "umbrella" "book2" "ship2" "ship3"
是否有更清洁的解决方案,例如使用纵梁包装?
答案 0 :(得分:1)
您仍然可以使用row_number()
中的dplyr
,但不需要转换为数据帧,即
sub('1$', '', ave(words, words, FUN = function(i) paste0(i, row_number(i))))
#[1] "book" "ship" "umbrella" "book2" "ship2" "ship3"
另一种选择是将make.unique
与gsubfn
一起使用,以将值增加1,即
library(gsubfn)
gsubfn("\\d+", function(x) as.numeric(x) + 1, make.unique(words))
#[1] "book" "ship" "umbrella" "book.2" "ship.2" "ship.3"