将许多行快速转换为json字符

时间:2016-03-12 16:09:59

标签: r data.table

我有~15 data.frame个行,100K-300K行。我想将变量v其他列压缩为字符json格式,以实现压缩存储的原因。请注意,v中的每个组都有多行(1个或更多;可能更多)。我有下面的代码,低效地使用 jsonlite 包进行转换,但由于我如何设置分割它的速度慢而且内存效率不高。我怎么能更快,更有效地记忆。我不需要使用 jsonlite 包,因为这是我知道如何执行此操作的唯一方法。我认为有一种方法可以快速地使用 data.table 来直接制作角色json,但却无法想到如何做到这一点。

PS 如果它有助于了解动机......我这样做是为了拥有一个哈希表,我可以在其中查找v然后将json转换回R data.frame。也许有一种比我更直接地使用 jsonlite 的方法,但toJSON(dat)不是我想要的。

MWE

set.seed(10)

dat <- data.frame(
    v = rep(c('red', 'blue'), each =3),
    w = sample(LETTERS, 6),
    x = sample(1:3, 6, T),
    y = sample(1:3, 6, T),
    z = sample(1:3, 6, T),
    stringsAsFactors = FALSE
)

dat

数据视图

     v w x y z
1  red N 1 1 2
2  red H 1 2 3
3  red K 2 2 3
4 blue P 2 2 2
5 blue B 2 1 3
6 blue E 2 1 2

转换

library(jsonlite)
jsonlist <- lapply(split(dat[-1], dat$v), function(x) as.character(toJSON(x)))

data.frame(
    v = names(jsonlist),
    json = unlist(jsonlist, use.names=FALSE),
    stringsAsFactors = FALSE
)

期望的结果

      v                                                                                  json
1 blue [{"w":"P","x":2,"y":2,"z":2},{"w":"B","x":2,"y":1,"z":3},{"w":"E","x":2,"y":1,"z":2}]
2  red [{"w":"N","x":1,"y":1,"z":2},{"w":"H","x":1,"y":2,"z":3},{"w":"K","x":2,"y":2,"z":3}]

2 个答案:

答案 0 :(得分:5)

使用data.table,您可以按v分组并将.SD传递给toJSON

library(data.table)
setDT(dat)
dat[, toJSON(.SD), by = v]
#      v                                                                                    V1
#1:  red [{"w":"N","x":1,"y":1,"z":2},{"w":"H","x":1,"y":2,"z":3},{"w":"K","x":2,"y":2,"z":3}]
#2: blue [{"w":"P","x":2,"y":2,"z":2},{"w":"B","x":2,"y":1,"z":3},{"w":"E","x":2,"y":1,"z":2}]

答案 1 :(得分:1)

我仍然不相信你所做的事情是有道理的,但是:

dat %>%
  group_by(v) %>%
  do(json = select(., -v) %>% toJSON ) %>%
  mutate(json = unlist(json))