使用数据表按不同顺序对多个列进行排名

时间:2017-09-18 13:18:17

标签: r data.table

使用下面的示例,我如何使用不同的顺序对多个列进行排名,例如,将y排名为降序,将z排名为升序?

require(data.table)

dt <- data.table(x = c(rep("a", 5), rep("b", 5)),
y = abs(rnorm(10)) * 10, z = abs(rnorm(10)) * 10)

cols <- c("y", "z")

dt[, paste0("rank_", cols) := lapply(.SD, function(x) frankv(x, ties.method = "min")), .SDcols = cols, by = .(x)]

1 个答案:

答案 0 :(得分:0)

data.table的{​​{1}}函数具有一些有用的功能,这些功能在基本R frank()函数中不可用(请参阅rank())。例如,我们可以通过在变量前加一个减号来反转排名的顺序:

?frank
library(data.table)
# create reproducible data
set.seed(1L)
dt <- data.table(x = c(rep("a", 5), rep("b", 5)),
                 y = abs(rnorm(10)) * 10, z = abs(rnorm(10)) * 10)
# rank y descending, z ascending
dt[, rank_y := frank(-y), x][, rank_z := frank(z), x][]

如果有多个列要单独排名,有些是降序,有些是升序,我们可以分两步完成此操作

    x         y          z rank_y rank_z
 1: a  6.264538 15.1178117      3      4
 2: a  1.836433  3.8984324      5      1
 3: a  8.356286  6.2124058      2      2
 4: a 15.952808 22.1469989      1      5
 5: a  3.295078 11.2493092      4      3
 6: b  8.204684  0.4493361      1      2
 7: b  4.874291  0.1619026      4      1
 8: b  7.383247  9.4383621      2      5
 9: b  5.757814  8.2122120      3      4
10: b  3.053884  5.9390132      5      3
# first rank all columns in descending order
cols_desc <- c("y")
dt[, paste0("rank_", cols_desc) := lapply(.SD, frankv, ties.method = "min", order = -1L), 
   .SDcols = cols_desc, by = x][]
# then rank all columns in ascending order
cols_asc <- c("z")
dt[, paste0("rank_", cols_asc) := lapply(.SD, frankv, ties.method = "min", order = +1L), 
   .SDcols = cols_asc, by = x][]