我在OLapCube软件包data.cube中遇到一些问题:
install.packages("data.cube", repos = paste0("https://", c(
"jangorecki.gitlab.io/data.cube",
"cloud.r-project.org"
)))
一些测试数据:
library(data.table)
set.seed(42)
dt <- CJ(color = c("green","yellow","red"),
year = 2011:2015,
month = 1:12,
status = c("active","inactive","archived","removed")
)[sample(600)]
dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]
现在,我想创建一个多维数据集并在时间维度上应用层次结构。像这样:
library(data.cube)
dc <- as.data.cube(dt, id.vars = c("color", "year", "month", "status"),
measure.vars = "value",
hierarchies = list(time <- list("year, month")))
如果我运行此代码,则会收到错误消息:
Error in as.data.cube.data.table(dt, id.vars = c("color", "year", "month", :
identical(names(hierarchies), id.vars) | identical(names(hierarchies), .... is not TRUE
如果我尝试类似
hierarchies = list(time <- list("year, month"), color <- list("color"),
status <- list("status"))
我遇到同样的错误。
答案 0 :(得分:3)
写得很好的问题。
我看到您是根据?as.data.cube
个示例制作的示例,因此我也会尝试使用该示例来回答您的问题
# Original example goes as follows
library(data.cube)
library(data.table)
set.seed(1L)
dt = CJ(color = c("green","yellow","red"),
year = 2011:2015,
status = c("active","inactive","archived","removed"))[sample(30)]
dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]
dc = as.data.cube(
x = dt, id.vars = c("color","year","status"),
measure.vars = "value",
hierarchies = sapply(c("color","year","status"),
function(x) list(setNames(list(character()), x)),
simplify=FALSE)
)
str(dc)
检查层次结构的有效性时,似乎会出现您的错误。
不幸的是,这不是非常有意义的错误,我创建了问题#18,因此有一天会得到改善。
因此,让我们比较手动层次结构和示例中创建的层次结构。
sapply(c("color","year","status"),
function(x) list(setNames(list(character()), x)),
simplify=FALSE) -> h
str(h)
#List of 3
# $ color :List of 1
# ..$ :List of 1
# .. ..$ color: chr(0)
# $ year :List of 1
# ..$ :List of 1
# .. ..$ year: chr(0)
# $ status:List of 1
# ..$ :List of 1
# .. ..$ status: chr(0)
hierarchies = list(time <- list("year, month"), color <- list("color"),
status <- list("status"))
str(hierarchies)
#List of 3
# $ :List of 1
# ..$ : chr "year, month"
# $ :List of 1
# ..$ : chr "color"
# $ :List of 1
# ..$ : chr "status"
我们可以看到手册中的层次结构是命名元素的列表,而您的示例是未命名元素的列表。
我相信您误用了<-
,其中应该使用=
。 <-
并不总是等于=
运算符。您可以在3.1.3.1 Assignment <-
vs =
中详细了解这种情况。
所以让我们看看修复是否足够
hierarchies = list(time = list(c("year, month")), color = list("color"),
status = list("status"))
dc <- as.data.cube(dt, id.vars = c("color", "year", "month", "status"),
measure.vars = "value",
hierarchies = hierarchies)
我们仍然遇到相同的错误,因此需要输入名称,而不是问题的根本原因。仔细查看后,我现在看到您要构建没有主键的 time 维度。
重要说明,您不能将多个列名作为单个字符串传递
"year, month"
应写为
c("year","month")
仍然需要 time 维主键为单个字段, year 和 month 只是这些属性。
因此,让我们为 time 维度创建主键,然后,由于我们的时间维度具有年月粒度,因此我们将在该粒度上创建密钥。
library(data.table)
set.seed(42)
dt <- CJ(color = c("green","yellow","red"),
year = 2011:2015,
month = 1:12,
status = c("active","inactive","archived","removed")
)[sample(600)
][, yearmonth:=sprintf("%04d%02d", year, month) # this ensure four numbers for year and 2 numbers for month
]
dt[, "value" := sample(4:7/2, nrow(dt), TRUE)]
现在让我们进行层次结构设置,注意year
已更改为yearmonth
。
在下面的层次结构中,值c("year","month")
的向量表示这些属性依赖于yearmonth
。请参见?as.data.cube
中的更多示例,以了解更复杂的层次结构情况。
hierarchies = list(
color = list(color = list(color = character())),
yearmonth = list(yearmonth = list(yearmonth = c("year","month"))),
status = list(status = list(status = character()))
)
dc = as.data.cube(
x = dt, id.vars = c("color","yearmonth","status"),
measure.vars = "value",
hierarchies = hierarchies
)
str(dc)
我们的data.cube
已成功创建。让我们尝试使用yearmonth
dc[, .(yearmonth=201105L)] -> d
as.data.table(d)
dc[, .(yearmonth=201105L), drop=FALSE] -> d
as.data.table(d)
现在尝试使用维度,年份和月份以及两者的属性来查询它
dc[, .(year=2011L)] -> d
as.data.table(d) # note that dimension is not being dropped because it still have more than 1 value
dc[, .(month=5L)] -> d
as.data.table(d)
dc[, .(year=2011L, month=5L)] -> d
as.data.table(d) # here dimension has been dropped because there was only single element in that dimension, you can of course use `drop=FALSE` if needed.
希望有帮助,祝你好运!