数据框中的粘贴速度更快

时间:2015-04-21 14:32:52

标签: r

我正在尝试修复数据框中的列,但这花费的时间太长了。我想找到等于4个字符的条目,并在开头粘贴零。数据框有2608475行。

我在R:

中编写了这段代码
i <- NULL
for (i in 1:length(cest07$CNAE.2.0.Classe)) {
  if (nchar(cest07$CNAE.2.0.Classe[i])==4) {
    cest07$CNAE.2.0.Classe[i] <- paste("0", cest07$CNAE.2.0.Classe[i], sep="")
  }
} 

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:2)

这是一个矢量化版本:

### create example data set
set.seed(1)
str_len <- rpois(25, 1.2) + 1
tmp <- sapply(str_len, function(x) paste(LETTERS[seq_len(x)], collapse=""))

tmp
#  [1] "A"     "AB"    "AB"    "ABCD"  "A"     "ABCD"  "ABCD"  "AB"    "AB"
# [10] "A"     "A"     "A"     "ABC"   "AB"    "ABC"   "AB"    "ABC"   "ABCDE"
# [19] "AB"    "ABC"   "ABCD"  "A"     "AB"    "A"     "A"

### prepend '0'
ind <- (nchar(tmp) == 4)
tmp[ind] <- paste0("0", tmp[ind])

tmp
#  [1] "A"     "AB"    "AB"    "0ABCD" "A"     "0ABCD" "0ABCD" "AB"    "AB"
# [10] "A"     "A"     "A"     "ABC"   "AB"    "ABC"   "AB"    "ABC"   "ABCDE"
# [19] "AB"    "ABC"   "0ABCD" "A"     "AB"    "A"     "A"