我有一系列列名,我试图将其标准化。
names <- c("apple", "banana", "orange", "apple1", "apple2", "apple10", "apple11", "banana2", "banana12")
我希望任何具有一位数字的东西都用零填充,所以
apple
banana
orange
apple01
apple02
apple10
apple11
banana02
...
我一直在尝试使用stringr
strdouble <- str_detect(names, "[0-9]{2}")
strsingle <- str_detect(names, "[0-9]")
str_detect(names[strsingle & !strdouble])
但无法弄清楚如何有选择地替换/前置......
答案 0 :(得分:8)
您可以使用sub("([a-z])([0-9])$","\\10\\2",names)
:
[1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02"
[9] "banana12"
它只更改字母后面有一个数字的名称($
是字符串的结尾)。
\\1
选择()
中的第一个块:字母。然后它将前导0,然后是()
中的第二个块:数字。
答案 1 :(得分:6)
这是一个使用负前瞻和后瞻断言来识别单个数字的选项。
gsub('(?<!\\d)(\\d)(?!\\d)', '0\\1', names, perl=TRUE)
# [1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02" "banana12"
答案 2 :(得分:1)
str_pad :
library(stringr)
pad_if = function(x, cond, n, fill = "0") str_pad(x, n*cond, pad = fill)
s = str_split_fixed(names,"(?=\\d)",2)
# [,1] [,2]
# [1,] "apple" ""
# [2,] "banana" ""
# [3,] "orange" ""
# [4,] "apple" "1"
# [5,] "apple" "2"
# [6,] "apple" "10"
# [7,] "apple" "11"
# [8,] "banana" "2"
# [9,] "banana" "12"
paste0(s[,1], pad_if(s[,2], cond = nchar(s[,2]) > 0, n = max(nchar(s[,2]))))
# [1] "apple" "banana" "orange" "apple01" "apple02" "apple10" "apple11" "banana02" "banana12"
这也延伸到从c("a","a2","a20","a202")
到c("a","a002","a020","a202")
的情况,而另一种方法则无法涵盖。
stringr包基于stringi,它具有此处使用的所有相同功能,我猜测。
来自基地的sprintf ,采用类似的方法:
pad_if2 = function(x, cond, n, fill = "0")
replace(x, cond, sprintf(paste0("%",fill,n,"d"), as.numeric(x)[cond]))
s0 = strsplit(names,"(?<=\\D)(?=\\d)|$",perl=TRUE)
s1 = sapply(s0,`[`,1)
s2 = sapply(sapply(s0,`[`,-1), paste0, "")
paste0(s1, pad_if2(s2, cond = nchar(s2) > 0, n = max(nchar(s2))))
pad_if2
的使用率低于pad_if
,因为它要求x
可以强制数字化。这里的每一步都比上面提到的包的相应代码更笨拙。
答案 3 :(得分:0)
关键是在数字前用$和字母标识单个数字。可以尝试:
gsub('[^0-9]([0-9])$','0\\1',names)
[1] "apple" "banana" "orange" "appl01" "appl02" "apple10" "apple11" "banan02" "banana12"
或前瞻。
gsub('(?<=[a-z])(\\d)$','0\\1',names,perl=T)