根据colnames和rownames过滤R数组

时间:2016-05-05 20:59:30

标签: arrays r

我有一个3D数组,其中内部矩阵的rownames和colnames是datestamp字符串。我想将它分成两个3D数组:

  • 行在列之后(如果之前或之前为NAs)
  • row在before / equal column之前(如果之后是NA)

以下是一些测试数据:

dimnames = list(
  c("A", "B"),
  c("2015-12-01", "2016-01-01", "2016-02-01"),
  c("2015-12-01", "2016-01-01", "2016-02-01", "2016-03-01")
  )
v = array(1:24,
  dim = lapply(dimnames,length),
  dimnames = dimnames
  )

然后我想在这里表现出一些魔力:

ret = split_it_up(v)
v1 = ret[[1]]
v2 = ret[[2]]

然后v1["A",,]将如下所示:

           2015-12-01 2016-01-01 2016-02-01 2016-03-01
2015-12-01         NA         NA         NA         NA
2016-01-01          3         NA         NA         NA
2016-02-01          5         11         NA         NA

v2["A",,]看起来像这样:

           2015-12-01 2016-01-01 2016-02-01 2016-03-01
2015-12-01          1          7         13         19
2016-01-01         NA          9         15         21
2016-02-01         NA         NA         17         23

v1["B",,]v2["B",,]将以相同方式拆分。)

受到lower.tri()工作方式的启发,到目前为止,我最好的尝试是使用2D矩阵(例如vx = v["A",,])我可以这样做:

matrix(
  as.character(row(vx,as.factor=T)) > as.character(col(vx,as.factor=T)),
  c(3,4))

给出:

      [,1]  [,2]  [,3]  [,4]
[1,] FALSE FALSE FALSE FALSE
[2,]  TRUE FALSE FALSE FALSE
[3,]  TRUE  TRUE FALSE FALSE

但后来我无法弄清楚如何处理它,更不用说如何让它适用于3D阵列的所有切片。

更新

一些不同的测试数据,以确保解决方案不会对行和列的排序做出假设。

dimnames = list(
  c("A", "B"),
  c("2016-01-01", "2015-12-01", "2016-02-01"),
  c("2016-02-01", "2015-12-01", "2016-03-01", "2016-01-01")
  )
v = array(1:24,
  dim = lapply(dimnames,length),
  dimnames = dimnames
  )

v["A",,]看起来像:

           2016-02-01 2015-12-01 2016-03-01 2016-01-01
2016-01-01          1          7         13         19
2015-12-01          3          9         15         21
2016-02-01          5         11         17         23

v1["A",,]将是:

           2016-02-01 2015-12-01 2016-03-01 2016-01-01
2016-01-01         NA          7         NA         NA
2015-12-01         NA         NA         NA         NA
2016-02-01         NA         11         NA         23

v2["A",,]将是:

           2016-02-01 2015-12-01 2016-03-01 2016-01-01
2016-01-01          1         NA         13         19
2015-12-01          3          9         15         21
2016-02-01          5         NA         17         NA

另一个更极端的例子:

dimnames = list(
  c("A", "B"),
  c("2015-10-01", "2015-12-01", "2015-11-01"),
  c("2016-02-01", "2016-04-01", "2016-03-01", "2016-01-01")
  )
v = array(1:24,
  dim = lapply(dimnames,length),
  dimnames = dimnames
  )

此处所有列都大于所有行。因此v1将是所有NAs,而v2将与v相同。

1 个答案:

答案 0 :(得分:3)

您似乎正在寻找splice.index比较" dimnames"最后两个维度:

# using "v" of your second example (first after update)
dnm2 = dimnames(v)[[2]][slice.index(v, 2)]
dnm3 = dimnames(v)[[3]][slice.index(v, 3)]

v1 = replace(v, dnm2 <= dnm3, NA)
v2 = replace(v, dnm2 > dnm3, NA)

v1["A", , ]
#           2016-02-01 2015-12-01 2016-03-01 2016-01-01
#2016-01-01         NA          7         NA         NA
#2015-12-01         NA         NA         NA         NA
#2016-02-01         NA         11         NA         23
v2["A", , ]
#           2016-02-01 2015-12-01 2016-03-01 2016-01-01
#2016-01-01          1         NA         13         19
#2015-12-01          3          9         15         21
#2016-02-01          5         NA         17         NA
v1["B", , ]
v2["B", , ]