我有两个载体
> filename
[1] "10021978_1909-07-21_ed-1_seq-4" "10021978_1910-01-19_ed-1_seq-31"
[3] "10021978_1910-01-19_ed-1_seq-31" "10021978_1910-01-19_ed-1_seq-31"
[5] "10021978_1910-01-19_ed-1_seq-31" "10021978_1911-06-07_ed-1_seq-12"
[7] "10021978_1911-07-05_ed-1_seq-11" "10021978_1911-07-12_ed-1_seq-11"
[9] "10021978_1911-07-12_ed-1_seq-11" "10021978_1911-09-27_ed-1_seq-4"
和
> dups = duplicated(filename)
> dups
[1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
我正在处理导出文件但不覆盖具有重复文件名的文件。我在这一组10中有一些重复。我需要做的是使这些文件名唯一。
如何创建一个新的向量,该向量在向量dups
为FALSE
的任何地方都没有任何内容,然后在TRUE
的任何地方都无效?棘手的是,当有一系列TRUE
彼此相邻时,我需要它从2开始递增,然后在有FALSE
时重置。我需要的这个集合的矢量是:
ans = c("", "", 2, 3, 4, "", "", "", 2, "")
这样我就可以将它附加到文件名来处理重复项。我需要的最终文件名向量是:
[1] "10021978_1909-07-21_ed-1_seq-4" "10021978_1910-01-19_ed-1_seq-31"
[3] "10021978_1910-01-19_ed-1_seq-31-2" "10021978_1910-01-19_ed-1_seq-31-3"
[5] "10021978_1910-01-19_ed-1_seq-31-4" "10021978_1911-06-07_ed-1_seq-12"
[7] "10021978_1911-07-05_ed-1_seq-11" "10021978_1911-07-12_ed-1_seq-11"
[9] "10021978_1911-07-12_ed-1_seq-11-2" "10021978_1911-09-27_ed-1_seq-4"
非常感谢你。
答案 0 :(得分:2)
make.unique
应该足够好,但如果您需要从2开始编号,也许使用ave
会更容易。
以下是两者的示例,您可以看到两种方法之间的区别:
a <- c("a", "a", "a", "b", "c", "d", "b", "d", "e")
make.unique(a, sep = "-")
# [1] "a" "a-1" "a-2" "b" "c" "d" "b-1" "d-1" "e"
dups <- ave(a, a, FUN = seq_along)
a[duplicated(a)] <- paste(a[duplicated(a)], dups[duplicated(a)], sep = "-")
a
# [1] "a" "a-2" "a-3" "b" "c" "d" "b-2" "d-2" "e"