如何使用R为df列中包含的相同值分配相同的名称

时间:2019-04-12 09:18:55

标签: r loops

大家好,我有一个像这样的df

  

df = data.frame(“描述” = c(“ Miriam”,“ Miriam”,“ Miriam”,“王牌”,“王牌”,“王牌”,“右”,“右”,“右” ,“ Sara”,“ Sara”,“ Star”,“ Star”,“ Star”,“ Sandra”))

我想创建一个循环,以创建一个新列,在其中为每个具有相同名称的样品分配一个样品编号,从而获得以下结果:

Description SampleID
Miriam  sample1
Miriam  sample1
Miriam  sample1
Trump   sample2
Trump   sample2
Trump   sample2
Right   sample3
Right   sample3
Right   sample3
Sara    sample4
Sara    sample4
Star    sample5
Star    sample5
Star    sample5
Sandra  sample6

有人知道怎么做吗? 非常感谢大家,将对您有所帮助。 安德里亚

3 个答案:

答案 0 :(得分:1)

一种dplyr可能是:

df %>%
 mutate(SampleID = paste0("sample", 
                   cumsum(Description != lag(Description, default = first(Description))) + 1))

   Description SampleID
1       Miriam  sample1
2       Miriam  sample1
3       Miriam  sample1
4        Trump  sample2
5        Trump  sample2
6        Trump  sample2
7        Right  sample3
8        Right  sample3
9        Right  sample3
10        Sara  sample4
11        Sara  sample4
12        Star  sample5
13        Star  sample5
14        Star  sample5
15      Sandra  sample6

答案 1 :(得分:1)

我们可以使用matchDescription中的值与所有unique值进行匹配,以创建唯一的ID,然后将paste的值与“样本”相匹配。 / p>

df$SampleID <- paste0("Sample", match(df$Description, unique(df$Description)))


df
#   Description SampleID
#1       Miriam  Sample1
#2       Miriam  Sample1
#3       Miriam  Sample1
#4        Trump  Sample2
#5        Trump  Sample2
#6        Trump  Sample2
#7        Right  Sample3
#8        Right  Sample3
#9        Right  Sample3
#10        Sara  Sample4
#11        Sara  Sample4
#12        Star  Sample5
#13        Star  Sample5
#14        Star  Sample5
#15      Sandra  Sample6

答案 2 :(得分:1)

您的列已经是因子(实际上是整数=您的因子的水平),您只需要排序水平以使其符合您的需要并使用as.numeric

df$sampleID <- paste0("Sample", 
                      as.numeric(factor(df$Description, 
                                        levels=unique(df$Description), ordered=TRUE)))

df
#   Description sampleID
#1       Miriam  Sample1
#2       Miriam  Sample1
#3       Miriam  Sample1
#4        Trump  Sample2
#5        Trump  Sample2
#6        Trump  Sample2
#7        Right  Sample3
#8        Right  Sample3
#9        Right  Sample3
#10        Sara  Sample4
#11        Sara  Sample4
#12        Star  Sample5
#13        Star  Sample5
#14        Star  Sample5
#15      Sandra  Sample6

NB:

如果您在列上应用as.numeric却不做任何其他事情,那么您已经获得了每个名称的索引,只是没有按照您想要的顺序进行

as.numeric(df$Description)
# [1] 1 1 1 6 6 6 2 2 2 4 4 5 5 5 3