是否可以在保留其密钥的同时将order
行存储在data.table
中?
假设我有以下虚拟表:
library(data.table)
dt <- data.table(id=letters[1:6],
group=sample(c("red", "blue"), replace=TRUE),
value.1=rnorm(6),
value.2=runif(6))
setkey(dt, id)
dt
id group value.1 value.2
1: a blue 1.4557851 0.73249612
2: b red -0.6443284 0.49924102
3: c blue -1.5531374 0.72977197
4: d red -1.5977095 0.08033604
5: e blue 1.8050975 0.43553048
6: f red -0.4816474 0.23658045
我想存储此表,以便行按group
排序,按value.1
按降序排列,即:
> dt[order(group, value.1, decreasing=T),]
id group value.1 value.2
1: f red -0.4816474 0.23658045
2: b red -0.6443284 0.49924102
3: d red -1.5977095 0.08033604
4: e blue 1.8050975 0.43553048
5: a blue 1.4557851 0.73249612
6: c blue -1.5531374 0.72977197
显然我可以将其保存为新变量,但我也希望将id
列保留为主键。
Arun对"What is the purpose of setting a key in data.table?"的回答表明,这可以通过巧妙使用setkey
来实现,因为它按照其键的顺序对data.table进行排序(尽管没有选项可以将键设置为递减顺序):
> setkey(dt, group, value.1, id)
> dt
id group value.1 value.2
1: c blue -1.5531374 0.72977197
2: a blue 1.4557851 0.73249612
3: e blue 1.8050975 0.43553048
4: d red -1.5977095 0.08033604
5: b red -0.6443284 0.49924102
6: f red -0.4816474 0.23658045
但是,我无法使用id
作为我的主键,因为group
是第一个提供的密钥:
> dt["a"]
group id value.1 value.2
1: a NA NA NA
答案 0 :(得分:3)
听起来您只想修改print.data.table
:
print.data.table = function(x, ...) {
# put whatever condition identifies your tables here
if ("group" %in% names(x) && "value.1" %in% names(x)) {
data.table:::print.data.table(x[order(group, value.1, decreasing = T)], ...)
} else {
data.table:::print.data.table(x, ...)
}
}
set.seed(2)
dt = data.table(id=letters[1:6],
group=sample(c("red", "blue"), replace=TRUE),
value.1=rnorm(6),
value.2=runif(6))
setkey(dt, id)
dt
# id group value.1 value.2
#1: a red 0.18484918 0.40528218
#2: e red 0.13242028 0.44480923
#3: c red -1.13037567 0.97639849
#4: b blue 1.58784533 0.85354845
#5: f blue 0.70795473 0.07497942
#6: d blue -0.08025176 0.22582546
dt["c"]
# id group value.1 value.2
#1: c red -1.130376 0.9763985
答案 1 :(得分:0)
我认为您仍然可以按ID 搜索,如下所示:
dt[J(unique(group),unique(value.1),"a"), nomatch=0]
group value.1 id value.2
1: blue 0.4928595 a 0.3311728
从我收集的唯一(column_name)是包含该列的所有值的方式。
我不确定这是否有帮助。
答案 2 :(得分:0)
在@ eddi的答案的基础上,我创建了一个hackish解决方案,我将order
的未评估调用存储为data.table
的属性,print.data.table
}服从:
set_order <- function(dt, cols, decreasing=FALSE) {
# Store a call to order as an additional attribute
attr(dt, "order") <- paste0("order(", paste(cols, collapse=", "),
", decreasing=", decreasing, ")")
invisible(dt)
}
print.data.table = function(x, ...) {
if (!is.null(attr(x, "order"))) {
# Use the stored ordering to print the data.table
data.table:::print.data.table(x[eval(parse(text=attr(x, "order")))], ...)
} else {
data.table:::print.data.table(x, ...)
}
}
给我我想要的行为:
dt <- set_order(dt, c("group", "value.1"), decreasing=T)
dt
# id group value.1 value.2
# 1: f red -0.4816474 0.23658045
# 2: b red -0.6443284 0.49924102
# 3: d red -1.5977095 0.08033604
# 4: e blue 1.8050975 0.43553048
# 5: a blue 1.4557851 0.73249612
# 6: c blue -1.5531374 0.72977197
tables()
# NAME NROW MB COLS KEY
# [1,] dt 6 1 id,group,value.1,value.2 id
# Total: 1MB