你如何生成长串数字?

时间:2016-05-18 11:57:43

标签: r string

我想生成一些包含大量数字的数字字符串,在本例中是合成数据集中的ID值。

对于短数字字符串,我会使用sample

sprintf("%05.f", sample(0:(1e5-1), 18))
##  [1] "54783" "80354" "53607" "99668" "63621" "07121" "15944" "27436" "96837"
## [10] "28751" "95315" "63326" "00981" "15300" "18448" "09885" "63360" "04539"

这对较长的字符串不起作用。首先,内存要求变得太大,然后你不能使数字足够大。例如,这不起作用:

sprintf("%020.f", sample(0:(1e20-1), 18))
## Error in 0:(1e+20 - 1) : result would be too long a vector

如何制作包含大量数字的数字字符串?

4 个答案:

答案 0 :(得分:7)

您可以使用stringi包:

 require(stringi)
 stri_rand_strings(10,50,pattern="[0-9]")
 #[1] "33163217620361477538822791082750025522246331345665"
 #[2] "85105858270154002408385176647161448078668054193081"
 #[3] "62417899981033664011261714060242781925235001978704"
 #[4] "17731152361720663463691231461493607438220463345863"
 #[5] "06316044683426574113640145569673845269595104465896"
 #[6] "17058300286927387520323781399768150137786864069558"
 #[7] "86204984977415277470013113957915963393339586096213"
 #[8] "56382530391794208466245591896055134584746907393458"
 #[9] "61740570216902905237145952608961548203505061535222"
 #[10] "28713530448562268345804947527043822080897315821103"

第一个参数是结果向量的长度,第二个是每个字符串的字符数,第三个是我们只需要数字。

坚持使用base R,可以尝试生成1000个字符串,每个字符串包含50个数字:

apply(matrix(sample(charToRaw("0123456789"),50*1000,replace=TRUE),nrow=1000),1,‌​rawToChar)

答案 1 :(得分:6)

基础R替代方案:

set.seed(123)
paste0(sample(0:9,50,replace=TRUE),collapse="")
#[1] "27489058549465182039866967552199670472321443112428"

编辑:正如@docendodiscimus所建议的那样,这可以与replicate()结合使用以获得任意数量的此类字符串:

replicate(10,paste0(sample(0:9,50,replace=TRUE),collapse=""))
# [1] "27489058549465182039866967552199670472321443112428" "04715217836032848874767042363126471498811636317045"
# [3] "53494896419309715954633239101668675687943401822027" "84321352425363357242618766358583725425992396944615"
# [5] "29654832114226073489297603456964502318185616373997" "22525714489869553305800177940671320302062108789107"
# [7] "70776410443470388238821710903962783466694152439326" "19516964381183371044438459723957375912029277122119"
# [9] "91953470363824219340565386331895392614012571877136" "53202887119441522628084764602728369116489047092067"

答案 2 :(得分:3)

强制性竞争:

GNS <- function(nNumbers, nCharsPerNumber)
{
  sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
    split(gl(nNumbers, nCharsPerNumber)) %>% 
    vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}


GNP <- function(nNumbers,nCharsPerNumber){

replicate(nNumbers,paste0(sample(0:9,nCharsPerNumber,replace=TRUE),collapse=""))
}

GST <- function(nNumbers,nCharsPerNumber){
stri_rand_strings(nNumbers,nCharsPerNumber,pattern="[0-9]")
}


microbenchmark(GNS(1000,100),GNP(1000,100),GST(1000,100),10)

分数......

Unit: milliseconds
           expr       min        lq     mean    median        uq       max
 GNS(1000, 100) 36.832684 38.918858 40.90260 40.750332 41.374165 46.369622
 GNP(1000, 100) 36.808395 39.310571 39.99557 40.094511 40.772055 44.025157
 GST(1000, 100)  1.882961  1.923672  2.03537  1.983199  2.166911  2.325648
 neval
    10
    10
    10

我们有一个明显的赢家!

编辑:添加另一个基本选项,它甚至更快。

GSAP<- function(nNumbers,nCharsPerNumber){
apply( matrix(sample(charToRaw("0123456789"),nNumbers*nCharsPerNumber,replace=TRUE),nrow=nCharsPerNumber),1, rawToChar )  }
Unit: microseconds
            expr       min        lq      mean     median       uq       max
 GSAP(1000, 100)   724.584   739.637   821.435   766.8345   899.06  1030.086
  GNS(1000, 100) 36189.180 38316.406 39739.471 39141.5695 39965.02 44478.450
  GNP(1000, 100) 35777.282 36331.839 38448.665 38575.8945 39725.21 43016.281
  GST(1000, 100)  1863.803  1898.013  1944.472  1918.7110  1975.33  2122.094

编辑第二:尝试更大的输入.. 并且这次获得正确的代码

(以秒为单位的时间)

     expr       min        lq      mean    median        uq       max neval
 GSAP(x, y)  3.906626  3.975160  4.069103  4.049784  4.163262  4.329284    10
  GNS(x, y) 33.645200 33.972587 34.513555 34.406009 35.141313 35.328662    10
  GNP(x, y) 30.833180 31.136971 33.037422 32.193070 33.010896 41.713811    10
  GST(x, y)  1.697303  1.706599  1.731205  1.735127  1.756961  1.763861    10

所以GST小幅上涨。

答案 3 :(得分:2)

生成单个数字,将它们分散在各个数字之间,然后将数字折叠在一起。

library(magrittr)
generateNumberStrings <- function(nNumbers, nCharsPerNumber)
{
  sample(0:9, nNumbers * nCharsPerNumber, replace = TRUE) %>%
    split(gl(nNumbers, nCharsPerNumber)) %>% 
    vapply(paste0, character(1), collapse = "", USE.NAMES = FALSE)
}

generateNumberStrings(18, 20)
##  [1] "06985095513359117867" "95278964413245221928" "75398392571928201881"
##  [4] "00722065797044523279" "24475619649735183646" "29165493966488037145"
##  [7] "34289922968745727406" "82354362380114534171" "84293845597888728670"
## [10] "97570546918892201649" "41421884356741221760" "99306177663904189401"
## [13] "25668966612346726451" "94949806854834288664" "43664073601604613019"
## [16] "25848242347176214032" "80736828777283687373" "83763855757083999312"