我有一个数据集,其中包含代码和标题范围:
library(data.table)
dataset <- data.table(
start = c("A00", "A20", "C10", "F00"),
end = c("A09", "A35", "C19", "F15"),
title = c("title1", "title2", "title3", "title4"))
#> start end title
#> 1: A00 A09 title1
#> 2: A20 A35 title2
#> 3: C10 C19 title3
#> 4: F00 F20 title4
期望的结果:
#> code title
#> 1: A00 title1
#> 2: A01 title1
#> 3: A02 title1
#> 4: A03 title1
#> 5: A04 title1
#> 6: A05 title1
#> 7: A06 title1
#> 8: A07 title1
#> 9: A08 title1
#> 10: A09 title1
#> 11: A20 title2
#> 12: A21 title2
#> 13: A22 title2
#> 14: A23 title2
#> 15: A24 title2
#> 16: A25 title2
#> 17: A26 title2
#> 18: A27 title2
#> 19: A28 title2
#> 20: A29 title2
#> 21: A30 title2
#> 22: A31 title2
#> 23: A32 title2
#> 24: A33 title2
#> 25: A34 title2
#> 26: A35 title2
#> 27: C10 title3
#> 28: C11 title3
#> 29: C12 title3
#> 30: C13 title3
#> 31: C14 title3
#> 32: C15 title3
#> 33: C16 title3
#> 34: C17 title3
#> 35: C18 title3
#> 36: C19 title3
#> 37: F00 title4
#> 38: F01 title4
#> 39: F02 title4
#> 40: F03 title4
#> 41: F04 title4
#> 42: F05 title4
#> 43: F06 title4
#> 44: F07 title4
#> 45: F08 title4
#> 46: F09 title4
#> 47: F10 title4
#> 48: F11 title4
#> 49: F12 title4
#> 50: F13 title4
#> 51: F14 title4
#> 52: F15 title4
#> code title
我目前的解决方案是:
seq_code <- function(start, end) {
letter <- substr(start, 1, 1)
start <- substr(start, 2, 3)
end <- substr(end, 2, 3)
paste0(letter, sprintf("%.2d", start:end))
}
rbindlist(lapply(1:nrow(dataset), function(i) {
dataset[i, list(code = seq_code(start, end), title = title)]
}))
是否有更优雅,更快速的解决方案?
UPD :结果我找到了基于@MichaelChirico建议的解决方案。
dataset[, list(code = seq_code(start, end)), by = title]
Bencmhark:
microbenchmark::microbenchmark(
lapply = rbindlist(lapply(1:nrow(dataset), function(i) {
dataset[i, list(code = seq_code(start, end), title = title)]
})),
by = dataset[, list(code = seq_code(start, end)), by = title]
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> lapply 2024.874 2065.387 2166.9491 2085.2535 2149.1420 4979.722 100 b
#> by 486.404 510.853 531.5532 519.6025 536.6735 821.413 100 a
答案 0 :(得分:5)
够好吗? (不完全是因为列被颠倒了,但setcolorder
将完成这项工作,如果这是至关重要的)
dataset[ ,
{x <- substr(start, 1, 1)
s <- as.integer(substr(start, 2, 3))
e <- as.integer(substr(end, 2, 3))
.(code=paste0(x, sprintf("%02d", s:e)))}, by = title]
这里有一个我认为看起来更好的替代方案,但结果却非常混乱:
dataset[,
{x<-unlist(lapply(
.SD,tstrsplit,split="(?<=[[:alpha:]])",perl=T))
.(code=paste0(
x[1],sprintf("%02d",do.call(
"seq",as.list(as.integer(x[c(2,4)]))))))},
by=title]