如何在基数R(例如)中为以下向量实现Radix sort:
vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)
总之,Radix排序执行以下操作:
unit
职位排序:
9021 22 34 504 25 6 9947 478
ten
职位排序:
504 6 9021 22 25 34 9947 478
hundred
职位排序:
6 9021 22 25 34 478 504 9947
thousand
职位排序:
6 22 25 34 478 504 9021 9947
等等。当然,vec
只是一个例子,解决方案能够处理包含任意长度数字的任何长度的数据。
输出将按vec
升序排序(或降序)。也就是说,
6 22 25 34 478 504 9021 9947
答案 0 :(得分:2)
这是我自己的解决方案:
f_radixSort <- function(x){
mx <- nchar(max(x))
for (i in 1:mx)
x <- x[order(x%%(10^i))]
return(x)
}
示例调用以及逐步排序的打印。
f_radixSort(vec)
# units
# [1] 9021 22 34 504 25 6 9947 478
# tens
# [1] 504 6 9021 22 25 34 9947 478
# hundreds
# [1] 6 9021 22 25 34 478 504 9947
# thousands
# [1] 6 22 25 34 478 504 9021 9947
# ten thousands
# [1] 6 22 25 34 478 504 9021 9947
短基准(我没有使用data.table
进行排序,因为我不知道它的原理是什么,而且,我在基础R中询问了答案):
library(microbenchmark)
vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)
all(radix(vec)==f_radixSort(vec))
# [1] TRUE
microbenchmark(radix(vec), f_radixSort(vec))
# Unit: microseconds
# expr min lq mean median uq max neval
# radix(vec) 857.239 915.230 980.39907 943.4745 1005.071 2081.051 100
# f_radixSort(vec) 39.061 42.216 52.28206 51.0810 54.686 111.775 100
# ========================================================
set.seed(200)
vec<-sample(10000,5000)
all(radix(vec)==f_radixSort(vec))
# [1] TRUE
microbenchmark(radix(vec), f_radixSort(vec))
# Unit: milliseconds
# expr min lq mean median uq max neval
# radix(vec) 6.724506 7.003191 8.135387 7.877256 8.195904 52.786763 100
# f_radixSort(vec) 2.132132 2.167436 2.302167 2.200337 2.268544 4.009464 100
答案 1 :(得分:1)
我知道data.table
开箱即用,因此您可以使用该软件包,例如,只需设置密钥对数据进行排序:
library(data.table)
vec <- c(25, 478, 34, 9021, 6, 9947, 504, 22)
f1<-function(vec){
DT<-data.table(vec)
setkey(DT, vec)
DT
}
f1(vec)
vec
1: 6
2: 22
3: 25
4: 34
5: 478
6: 504
7: 9021
8: 9947
我想你可以自己实现算法,但是在R中可能会很慢。函数看起来像这样:
library(stringr)
library(dplyr)
library(tidyr)
radix<-function(numbers){
digits<-nchar(max(numbers))
numbers<-str_pad(numbers, digits, pad = "0")
rad<-data.frame(matrix(0, ncol = digits, nrow = length(numbers)))
for(i in 1:digits){
rad[,i] <- str_sub(numbers, i,i)
}
for(z in rev(1:ncol(rad))){
a <- which(rad[,z] == 0 )
b <- which(rad[,z] == 1 )
c <- which(rad[,z] == 2 )
d <- which(rad[,z] == 3 )
e <- which(rad[,z] == 4 )
f <- which(rad[,z] == 5 )
g <- which(rad[,z] == 6 )
h <- which(rad[,z] == 7 )
i <- which(rad[,z] == 8 )
j <- which(rad[,z] == 9 )
k<-c(a,b,c,d,e,f,g,h,i,j)
rad<-rad[k,]
}
rad<-rad %>% unite_(col = "num", from = colnames(rad), sep = "")
return(as.numeric(rad$num))
}
它可以清理/加速,但这是基数排序,据我所知:
radix(vec)
[1] 6 22 25 34 478 504 9021 9947
比较速度:
microbenchmark(f1(vec), radix(vec))
Unit: microseconds
expr min lq mean median uq max neval
f1(vec) 290.6 314.8 335 327 349.1 524.1 100
radix(vec) 1062.8 1121.7 1458 1163 1250.5 24407.9 100
更大的速度比较:
set.seed(200)
more<-sample(10000,5000)
microbenchmark(f1(more), radix(more))
expr min lq mean median uq max neval
f1(more) 539.3 565.5 623 622.2 664.8 769.7 100
radix(more) 10457.8 10668.0 11683 11133.7 12298.3 25010.6 100
答案 2 :(得分:0)
我的解决方案看起来像这样-请忍受,我是初学者;-) 但是结果是正确的:
radixSort <- function(sortvec) {
mx <- nchar(max(sortvec))
## for all digits up to the number of digits in the longest number:
for (i in 1:mx){
## empty the 10 buckets
bucket <- list()
## for all 10 buckets:
for (bucketnumber in 1:10){
## fill each bucket with the appropriate numbers
bucket[[bucketnumber]] <- sortvec[dig(sortvec, i)==(bucketnumber-1)]
}
## empty the sorted vector
sortvec <- c()
## fill the sorted vector with the the contents of buckets 1-10
for (k in 1:10){
sortvec <- c(sortvec, bucket[[k]])
}
}
return(sortvec)
}
dig <- function(x, st) {
## returns the value of digit #st in number x, e.g. dig(3456, 2) returns 5
remainder <- x%%(10^st)
divisor <- 10^(st-1)
return(trunc(remainder/divisor))
}