根据字符串的第一个字母替换向量中的元素

时间:2016-09-22 11:32:49

标签: r regex replace

考虑以下向量:

ID <- c("A1","B1","C1","A12","B2","C2","Av1")

names <- c("ALPHA","BRAVO","CHARLIE","AVOCADO")

我想根据向量ID的第一个字母,用向量names替换向量names中每个元素的第一个字符。我还想在_0 之间的每个数字之前添加0:9

请注意,元素Av1AVOCADO会稍微减少一些内容,尤其是v中的小写Av1

结果应如下所示:

res <- c("ALPHA_01","BRAVO_01","CHARLIE_01","ALPHA_12","BRAVO_02","CHARLIE_02", "AVOCADO_01")

我知道应该用regex来完成,但我现在已经尝试了2天而且无法到达任何地方。

2 个答案:

答案 0 :(得分:1)

我们可以使用gsubfn

library(gsubfn)
#remove the number part from 'ID' (using `sub`) and get the unique elements
nm1 <- unique(sub("\\d+", "", ID))
#using gsubfn, replace the non-numeric elements with the matching 
#key/value pair in the replacement
#finally format to add the "_" with sub
sub("(\\d+)$", "_0\\1", gsubfn("(\\D+)", as.list(setNames(names, nm1)), ID))
#[1] "ALPHA_01"   "BRAVO_01"   "CHARLIE_01" "ALPHA_02" 
#[5] "BRAVO_02"   "CHARLIE_02" "AVOCADO_01"

(\\d+)表示一个或多个数字元素,(\\D+)是一个或多个非数字元素。我们将它包装在括号内以作为一组捕获,并将其替换为反向引用(\\1 - 因为它是捕获组的第一个反向引用。)

更新

如果条件是仅向那些数字小于10的ID附加0,那么我们可以在第二个gsubfnsprintf

的情况下执行此操作
gsubfn("(\\d+)", ~sprintf("_%02d", as.numeric(x)), 
                      gsubfn("(\\D+)", as.list(setNames(names, nm1)), ID))
#[1] "ALPHA_01"   "BRAVO_01"   "CHARLIE_01" "ALPHA_12" 
#[5]  "BRAVO_02"   "CHARLIE_02" "AVOCADO_01"

答案 1 :(得分:1)

通过基础R执行此操作,我们可以搜索第二个字符V(如AVOCADO中)和子串2个字符(如果该字符为真)或1个字符(如果不是)。这将捕获AVOCADO和ALPHA。然后,我们将这些子串与从ID中提取的字母进行匹配(也转换为toupper以使用AV捕获Av)。最后粘贴_0以及每个ID

中找到的数字
paste0(names[match(toupper(sub('\\d+', '', ID)), 
               ifelse(substr(names, 2, 2) == 'V', substr(names, 1, 2), 
                                substr(names, 1, 1)))],'_0', sub('\\D+', '', ID))
#[1] "ALPHA_01" "BRAVO_01" "CHARLIE_01" "ALPHA_02" "BRAVO_02" "CHARLIE_02" "AVOCADO_01"