如何通过使用字符变量的起始字母和r中的数字来选择和替换字符变量的多个值

时间:2019-09-25 12:37:14

标签: r regex dplyr data.table

这是我的数据集,具有以下值:

  installPlugins:
  #- kubernetes:1.18.1
  #- workflow-job:2.33
  #- workflow-aggregator:2.6
  #- credentials-binding:1.19
  #- git:3.11.0
  #- blueocean:1.18.1
  #- kubernetes-cd:2.0.0

这仅适用于1个字母,即a或h或s或n

df=as.data.table(c("hello","name","age","hey","apron","street","night","soap"))

colnames(df)="V1"

Output:

  V1
1  2
2  4
3  1
4  2
5  1
6  3
7  4
8  3

我要替换多个值: 例如我想将范围a-h之间的单词替换为1 在这里,我得到的是NA值

df %>%
  mutate(V1 = case_when(startsWith(df$V1, "a") == TRUE~ '1',
                        startsWith(df$V1, "h") == TRUE~ '2',
                        startsWith(df$V1, "s") == TRUE~ '3',
                        startsWith(df$V1, "n") == TRUE~ '4'))

      V1 V2
1  hello  2
2   name  4
3    age  1
4    hey  2
5  apron  1
6 street  3
7  night  4
8   soap  3

2 个答案:

答案 0 :(得分:0)

library(dplyr)
df %>%
  mutate(V2 = case_when(substr(V1, 1, 1) %in% letters[1:8] ~ "1",
                        substr(V1, 1, 1) == "s" ~ "3",
                        substr(V1, 1, 1) == "n" ~ "4"))

答案 1 :(得分:0)

使用regex方法

df%>%mutate(V2 = case_when(grepl("^[a-h].*",V1)~"1",
                           grepl("^s.*",V1)~"3",
                           grepl("^n.*",V1)~"4"))

      V1 V2
1  hello  1
2   name  4
3    age  1
4    hey  1
5  apron  1
6 street  3
7  night  4
8   soap  3