我有一个字符向量列表,我想访问每个元素的最后一个值。
mylist<-list(A=c("a"),
B=c("a","b"),
C=c("a","b","c"),
D=c("a","b","c","d"))
首先,(通过查看Python中的一些相关线程),我认为我可以做类似的事情:
for(i in 1:length(mylist)){
print(mylist[[i]][-1])
}
# character(0)
# [1] "b"
# [1] "b" "c"
# [1] "b" "c" "d"
我想这行不通。结果,基本上,我想
myfunction<-function(mylist){
output<-as.character()
for(i in 1:length(mylist)){
output<-c(output, mylist[[i]][length(mylist[[i]])])}
return(output)
}
myfunction(mylist)
# [1] "a" "b" "c" "d"
有没有更有效的方法?
答案 0 :(得分:4)
正如Rich Scriven在(已删除的)注释中指出的,有很多方法可以完成此任务,其中一种方法是将sapply
和tail
与参数n = 1
一起使用:>
sapply(mylist, tail, n = 1)
# A B C D
#"a" "b" "c" "d"
另一个,safer and potentially faster variant的想法是使用vapply
vapply(mylist, tail, FUN.VALUE = character(1), n = 1)
# or a little shorter
# vapply(mylist, tail, "", 1)
(另一个)基准测试
set.seed(1)
mylist <- replicate(1e5, list(sample(letters, size = runif(1, 1, length(letters)))))
benchmark <- microbenchmark(
f1 = {myfunction(mylist)},
f2 = {sapply(mylist, function(l) l[length(l)])},
f3 = {vapply(mylist, function(l) l[length(l)], "")},
f4 = {sapply(mylist, tail, 1)},
f5 = {vapply(mylist, tail, "", 1)},
f6 = {mapply("[", mylist, lengths(mylist))},
f7 = {mapply("[[", mylist, lengths(mylist))}, # added this out of curiosity
f8 = {unlist(mylist)[cumsum(lengths(mylist))]},
times = 100L
)
autoplot(benchmark)
此处结果相同:Rich的unlist(mylist)[cumsum(lengths(mylist_long))]
是迄今为止最快的。 sapply
和vapply
之间似乎没有真正的区别。 myfunction()
,如OP的问题所定义。
#benchmark
#Unit: milliseconds
# expr min lq mean median uq max neval
# f1 28797.26121 30462.16785 31836.26875 31191.7762 32950.92537 36586.5477 100
# f2 106.34213 117.75074 127.97763 124.9191 134.82047 176.2058 100
# f3 99.72042 106.87308 119.59811 113.9663 123.63619 465.5335 100
# f4 1242.11950 1291.38411 1409.35750 1350.3460 1505.76089 1880.6537 100
# f5 1189.22615 1274.48390 1366.07234 1333.8885 1418.75394 1942.2803 100
# f6 112.27316 123.73429 132.39888 129.8220 138.33851 191.2509 100
# f7 107.27392 118.19201 128.06681 123.1317 133.29827 208.8425 100
# f8 28.03948 28.84125 31.19637 30.3115 32.94077 40.9624 100
答案 1 :(得分:3)
以注释中提出的解决方案为基准,我们发现使用unlist
的Rich的提议是最快的。
通过检查代码并调整参数,我们可以使其更快。
tail
的慢度在此处讨论:https://stackoverflow.com/a/37238415/2270475
关于OP的示例数据:
library(microbenchmark)
microbenchmark(
r2evans = sapply(mylist, function(l) l[length(l)]),
markus = sapply(mylist, tail, 1),
Rich1 = mapply("[", mylist, lengths(mylist)),
Rich2 = unlist(mylist)[cumsum(lengths(mylist))],
markus2 = vapply(mylist, tail, character(1), 1),
mm = .Internal(unlist(mylist,FALSE,FALSE))[cumsum(lengths(mylist,FALSE))],
unit = "relative"
)
# Unit: relative
# expr min lq mean median uq max neval
# r2evans 16.083333 12.764706 25.545957 12.368421 13.133333 122.1428571 100
# markus 82.333333 59.294118 50.937673 60.342105 60.644444 10.2253968 100
# Rich1 19.583333 15.294118 13.368047 15.394737 15.622222 2.7492063 100
# Rich2 4.166667 3.705882 3.211045 3.789474 3.911111 0.7650794 100
# markus2 73.166667 53.176471 44.669822 50.263158 54.155556 10.4857143 100
# mm 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 100
在1000倍以上的列表上:
mylist_long <- do.call(c,replicate(1000,mylist,simplify = F))
length(mylist_long) # [1] 4000
microbenchmark(
r2evans = sapply(mylist_long, function(l) l[length(l)]),
markus = sapply(mylist_long, tail, 1),
Rich1 = mapply("[", mylist_long, lengths(mylist_long)),
Rich2 = unlist(mylist_long)[cumsum(lengths(mylist_long))],
markus2 = vapply(mylist_long, tail, character(1), 1),
mm = .Internal(unlist(mylist_long,FALSE,FALSE))[cumsum(lengths(mylist_long,FALSE))],
unit = "relative"
)
# Unit: relative
# expr min lq mean median uq max neval
# r2evans 26.14882 27.20436 27.07436 28.13731 28.54701 27.23846 100
# markus 679.57251 698.84828 668.00160 715.30180 674.71067 443.42502 100
# Rich1 27.53607 28.80581 29.82736 29.00353 31.02343 38.79978 100
# Rich2 22.39863 21.79129 20.41467 21.53371 20.70750 13.03032 100
# markus2 667.97494 702.14882 676.91881 718.41899 696.11934 633.17181 100
# mm 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 100