考虑以下向量:
ID <- c("A1","B1","C1","A12","B2","C2","Av1")
names <- c("ALPHA","BRAVO","CHARLIE","AVOCADO")
我想根据向量ID
的第一个字母,用向量names
替换向量names
中每个元素的第一个字符。我还想在_0
之间的每个数字之前添加0:9
。
请注意,元素Av1
和AVOCADO
会稍微减少一些内容,尤其是v
中的小写Av1
。
结果应如下所示:
res <- c("ALPHA_01","BRAVO_01","CHARLIE_01","ALPHA_12","BRAVO_02","CHARLIE_02", "AVOCADO_01")
我知道应该用regex
来完成,但我现在已经尝试了2天而且无法到达任何地方。
答案 0 :(得分:1)
我们可以使用gsubfn
。
library(gsubfn)
#remove the number part from 'ID' (using `sub`) and get the unique elements
nm1 <- unique(sub("\\d+", "", ID))
#using gsubfn, replace the non-numeric elements with the matching
#key/value pair in the replacement
#finally format to add the "_" with sub
sub("(\\d+)$", "_0\\1", gsubfn("(\\D+)", as.list(setNames(names, nm1)), ID))
#[1] "ALPHA_01" "BRAVO_01" "CHARLIE_01" "ALPHA_02"
#[5] "BRAVO_02" "CHARLIE_02" "AVOCADO_01"
(\\d+)
表示一个或多个数字元素,(\\D+)
是一个或多个非数字元素。我们将它包装在括号内以作为一组捕获,并将其替换为反向引用(\\1
- 因为它是捕获组的第一个反向引用。)
如果条件是仅向那些数字小于10的ID附加0,那么我们可以在第二个gsubfn
和sprintf
gsubfn("(\\d+)", ~sprintf("_%02d", as.numeric(x)),
gsubfn("(\\D+)", as.list(setNames(names, nm1)), ID))
#[1] "ALPHA_01" "BRAVO_01" "CHARLIE_01" "ALPHA_12"
#[5] "BRAVO_02" "CHARLIE_02" "AVOCADO_01"
答案 1 :(得分:1)
通过基础R执行此操作,我们可以搜索第二个字符V
(如AVOCADO中)和子串2个字符(如果该字符为真)或1个字符(如果不是)。这将捕获AVOCADO和ALPHA。然后,我们将这些子串与从ID中提取的字母进行匹配(也转换为toupper
以使用AV捕获Av)。最后粘贴_0
以及每个ID
paste0(names[match(toupper(sub('\\d+', '', ID)),
ifelse(substr(names, 2, 2) == 'V', substr(names, 1, 2),
substr(names, 1, 1)))],'_0', sub('\\D+', '', ID))
#[1] "ALPHA_01" "BRAVO_01" "CHARLIE_01" "ALPHA_02" "BRAVO_02" "CHARLIE_02" "AVOCADO_01"