重复矢量的字母

时间:2014-02-10 15:54:18

标签: r

是否有在R?

中创建重复字母列表的功能

类似

letters[1:30]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z" NA  NA  NA  NA

但不是NA,我希望输出继续aa,bb,cc,dd ......

6 个答案:

答案 0 :(得分:8)

将快速功能拼凑起来做这样的事情并不困难:

myLetters <- function(length.out) {
  a <- rep(letters, length.out = length.out)
  grp <- cumsum(a == "a")
  vapply(seq_along(a), 
         function(x) paste(rep(a[x], grp[x]), collapse = ""),
         character(1L))
}
myLetters(60)
#  [1] "a"   "b"   "c"   "d"   "e"   "f"   "g"   "h"   "i"   "j"   "k"   "l"  
# [13] "m"   "n"   "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"  
# [25] "y"   "z"   "aa"  "bb"  "cc"  "dd"  "ee"  "ff"  "gg"  "hh"  "ii"  "jj" 
# [37] "kk"  "ll"  "mm"  "nn"  "oo"  "pp"  "qq"  "rr"  "ss"  "tt"  "uu"  "vv" 
# [49] "ww"  "xx"  "yy"  "zz"  "aaa" "bbb" "ccc" "ddd" "eee" "fff" "ggg" "hhh"

答案 1 :(得分:8)

如果您只想要唯一的名称,可以使用

make.unique(rep(letters, length.out = 30), sep='')

编辑:

以下是使用Reduce重复字母的另一种方式。

myletters <- function(n) 
unlist(Reduce(paste0, 
       replicate(n %/% length(letters), letters, simplify=FALSE),
       init=letters,
       accumulate=TRUE))[1:n]

myletters(60)
#  [1] "a"   "b"   "c"   "d"   "e"   "f"   "g"   "h"   "i"   "j"   "k"   "l"  
# [13] "m"   "n"   "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"  
# [25] "y"   "z"   "aa"  "bb"  "cc"  "dd"  "ee"  "ff"  "gg"  "hh"  "ii"  "jj" 
# [37] "kk"  "ll"  "mm"  "nn"  "oo"  "pp"  "qq"  "rr"  "ss"  "tt"  "uu"  "vv" 
# [49] "ww"  "xx"  "yy"  "zz"  "aaa" "bbb" "ccc" "ddd" "eee" "fff" "ggg" "hhh"

答案 2 :(得分:6)

工作解决方案

生成Excel样式列名的函数,即

# A, B, ..., Z, AA, AB, ..., AZ, BA, BB, ..., ..., ZZ, AAA, ...

letterwrap <- function(n, depth = 1) {
    args <- lapply(1:depth, FUN = function(x) return(LETTERS))
    x <- do.call(expand.grid, args = list(args, stringsAsFactors = F))
    x <- x[, rev(names(x)), drop = F]
    x <- do.call(paste0, x)
    if (n <= length(x)) return(x[1:n])
    return(c(x, letterwrap(n - length(x), depth = depth + 1)))
}

letterwrap(26^2 + 52) # through AAZ

Botched attempt

最初我认为最好通过转换为26来巧妙地完成,但这不起作用。问题是Excel列名不是基础26 ,这花了我很长时间才意识到。捕获为0:如果您尝试将字母(如A)映射到0,则在想要区分AAA以及{{1}时遇到问题}} ...

说明问题的另一种方法是“数字”。在基数10中,有10个单位数字(0-9),然后是90个两位数字(10:99),900个三位数字......用{{1}推广到AAA个数字} 10^d - 10^(d - 1)的数字。但是,在Excel列名称中,有26个单字母名称,26 ^ 2个双字母名称,26 ^ 3个三字母名称,没有减法。

我会将此代码作为警告留给其他人:

d

答案 3 :(得分:3)

几乎可以肯定有更好的方法,但这就是我最终的结果:

letter_wrap <- function(idx) {  
  vapply(
    idx,
    function(x) 
      paste0(
        rep(
          letters[replace(x %% 26, !x %% 26, 26)], 1 + (x - 1) %/% 26 ), collapse=""), "")
}
letter_wrap(1:60)
#  [1] "a"   "b"   "c"   "d"   "e"   "f"   "g"   "h"   "i"   "j"   "k"   "l"   "m"   "n"  
# [15] "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"   "z"   "aa"  "bb" 
# [29] "cc"  "dd"  "ee"  "ff"  "gg"  "hh"  "ii"  "jj"  "kk"  "ll"  "mm"  "nn"  "oo"  "pp" 
# [43] "qq"  "rr"  "ss"  "tt"  "uu"  "vv"  "ww"  "xx"  "yy"  "zz"  "aaa" "bbb" "ccc" "ddd"
# [57] "eee" "fff" "ggg" "hhh"
编辑:在我发布之前没有注意到Ananda的回答。这个与我离开的不同。请注意,它将索引向量作为输入,而不是项目数。

答案 4 :(得分:2)

可能不是最干净,但很容易看到发生了什么:

foo<-letters[1:26]
outlen <- 73 # or whatever length you want
 oof <- vector(len=26)
for ( j in 2:(outlen%/%26)) {
    for (k in 1:26) oof[k] <- paste(rep(letters[k],j),sep='',collapse='')
    foo<-c(foo,oof)
}
for (jj in 1:(outlen%%26) ) foo[(26*j)+jj]<-paste(rep(letters[jj],(j+1)),sep='',collapse='')

foo
[1] "a"   "b"   "c"   "d"   "e"   "f"   "g"   "h"   "i"   "j"   "k"   "l"   "m"   "n"  
[15] "o"   "p"   "q"   "r"   "s"   "t"   "u"   "v"   "w"   "x"   "y"   "z"   "aa"  "bb" 
[29] "cc"  "dd"  "ee"  "ff"  "gg"  "hh"  "ii"  "jj"  "kk"  "ll"  "mm"  "nn"  "oo"  "pp" 
[43] "qq"  "rr"  "ss"  "tt"  "uu"  "vv"  "ww"  "xx"  "yy"  "zz"  "aaa" "bbb" "ccc" "ddd"
[57] "eee" "fff" "ggg" "hhh" "iii" "jjj" "kkk" "lll" "mmm" "nnn" "ooo" "ppp" "qqq" "rrr"
[71] "sss" "ttt" "uuu"

编辑:马修获胜,不懈努力:

microbenchmark(anandaLetters(5000),matthewletters(5000),carlletters(5000),times=10)
Unit: milliseconds
                 expr       min        lq     median        uq        max neval
  anandaLetters(5000) 85.339200 85.567978 85.9827715 86.260298  86.612231    10
 matthewletters(5000)  3.413706  3.503506  3.9067535  3.946950   4.106453    10
    carlletters(5000) 94.893983 95.405418 96.4492430 97.234784 110.681780    10

答案 5 :(得分:0)

让我对序列“ AY”“ BZ”进行一些更正。您必须把一封信寄给上一个数字字母。

colExcel2num <- function(x) {
  p <- seq(from = nchar(x) - 1, to = 0)
  y <- utf8ToInt(x) - utf8ToInt("A") + 1L
  S <- sum(y * 26^p)
  return(S)
}

## Converts a number to base 26, returns a vector for each "digit"
b26 <- function(n) {
  stopifnot(n >= 0)
  if (n <= 1) return(n)
  n26 <- rep(NA, ceiling(log(n, base = 26)))
  for (i in seq_along(n26)) {
    n26[i] <- (n %% 26)
    n <- n %/% 26
  }
  return(rev(n26))
}

## Retorna el nombre de columna Excel según la posición de columna
## A, B, C, ..., Z, AA, AB, AC, ..., AZ, BA, ...
colnum2Excel <- function(n, lower = FALSE) {
  let <- if (lower) letters else LETTERS
  base26 <- b26(n)
  i <- base26 == 0
  base26[i] <- 26
  base26[lead(i, default = FALSE)] <- base26[lead(i, default = FALSE)] - 1
  paste(let[base26], collapse = "")
}

## Return df's column index based on column name
## A, B, C, ..., Z, AA, AB, AC, ..., AZ, BA, ...
## buscando el número de columna en el df
varnum2Excel <- function(df, colname, lower = FALSE) {
  index <- match(colname, names(df))
  stopifnot(index > 0)
  return(colnum2Excel(index))
}

这里有个例子:

require(openxlsx)
table <- data.frame(milk = c(1,2,3), oranges = c(2,4,6))


table <- table %>%
  mutate(
    ajjhh = sprintf(paste0(
      varnum2Excel(.,"milk"), "%1$s", " + ", 
      varnum2Excel(.,"oranges"),"%1$s"),
      2:(n()+1)    
    )
  )

class(table$ajjhh) <- c(class(table$ajjhh), "formula")
wb <- createWorkbook()
addWorksheet(wb = wb, sheetName = "Sheet1", tabColour = "chocolate4")
writeData (wb, "Sheet1", x = table)
saveWorkbook(wb, "formulashasnotgone.xlsx", overwrite = TRUE)