大家好,我有一个像这样的df
df = data.frame(“描述” = c(“ Miriam”,“ Miriam”,“ Miriam”,“王牌”,“王牌”,“王牌”,“右”,“右”,“右” ,“ Sara”,“ Sara”,“ Star”,“ Star”,“ Star”,“ Sandra”))
我想创建一个循环,以创建一个新列,在其中为每个具有相同名称的样品分配一个样品编号,从而获得以下结果:
Description SampleID
Miriam sample1
Miriam sample1
Miriam sample1
Trump sample2
Trump sample2
Trump sample2
Right sample3
Right sample3
Right sample3
Sara sample4
Sara sample4
Star sample5
Star sample5
Star sample5
Sandra sample6
有人知道怎么做吗? 非常感谢大家,将对您有所帮助。 安德里亚
答案 0 :(得分:1)
一种dplyr
可能是:
df %>%
mutate(SampleID = paste0("sample",
cumsum(Description != lag(Description, default = first(Description))) + 1))
Description SampleID
1 Miriam sample1
2 Miriam sample1
3 Miriam sample1
4 Trump sample2
5 Trump sample2
6 Trump sample2
7 Right sample3
8 Right sample3
9 Right sample3
10 Sara sample4
11 Sara sample4
12 Star sample5
13 Star sample5
14 Star sample5
15 Sandra sample6
答案 1 :(得分:1)
我们可以使用match
将Description
中的值与所有unique
值进行匹配,以创建唯一的ID,然后将paste
的值与“样本”相匹配。 / p>
df$SampleID <- paste0("Sample", match(df$Description, unique(df$Description)))
df
# Description SampleID
#1 Miriam Sample1
#2 Miriam Sample1
#3 Miriam Sample1
#4 Trump Sample2
#5 Trump Sample2
#6 Trump Sample2
#7 Right Sample3
#8 Right Sample3
#9 Right Sample3
#10 Sara Sample4
#11 Sara Sample4
#12 Star Sample5
#13 Star Sample5
#14 Star Sample5
#15 Sandra Sample6
答案 2 :(得分:1)
您的列已经是因子(实际上是整数=您的因子的水平),您只需要排序水平以使其符合您的需要并使用as.numeric
:
df$sampleID <- paste0("Sample",
as.numeric(factor(df$Description,
levels=unique(df$Description), ordered=TRUE)))
df
# Description sampleID
#1 Miriam Sample1
#2 Miriam Sample1
#3 Miriam Sample1
#4 Trump Sample2
#5 Trump Sample2
#6 Trump Sample2
#7 Right Sample3
#8 Right Sample3
#9 Right Sample3
#10 Sara Sample4
#11 Sara Sample4
#12 Star Sample5
#13 Star Sample5
#14 Star Sample5
#15 Sandra Sample6
NB:
如果您在列上应用as.numeric
却不做任何其他事情,那么您已经获得了每个名称的索引,只是没有按照您想要的顺序进行>
as.numeric(df$Description)
# [1] 1 1 1 6 6 6 2 2 2 4 4 5 5 5 3