如何删除\"将R字符对象写入JSON时

时间:2016-10-06 13:20:32

标签: json r elasticsearch

我有data.table喜欢这个:

test <- data.table(city = c("Berlin", "Berlin", "Berlin", "Amsterdam", "Amsterdam"),
                   key1 = c("A", "A", "A", "B", "B"),
                   value1 = c(1, 2, 3, 4, 5),
                   value2 = c(0.1, 0.2, 0.3, 0.4, 0.5),
                   kpi = c(10, 15, 20, 25, 30))

我想将这些数据上传到Elasticsearch,但具有特定的结构:

library(RJSONIO)
res <-test[, .(factors = toJSON(.SD)), 
             by = .(city, key1), 
             .SDcols = c("value1", "kpi")]

此代码在列factors中创建不同的JSON。由于我想摆脱库引入的\n序列,我可以在赋值中替换这些字符串:

res <-test[, .(factors = gsub("\n", "", toJSON(.SD))), 
         by = .(city, key1), 
         .SDcols = c("value1", "kpi")]

当我想将此对象上传到Elasticsearch(我使用elastic包)时出现问题。由于R使用反斜杠来转义字符串中的双引号,因此当我使用:

编写对象时
docs_bulk(res, "index")

它在使用内部\""toJSON)创建的字符串字段中写入value1而不是kpi。将对象写入文件时也可以检查:

write(toJSON(res), "~/output.json")

{
 "city": [ "Berlin", "Amsterdam" ],
"key1": [ "A", "B" ],
"factors": [ "{ \"value1\": [1, 2, 3 ],\"kpi\": [10, 15, 20 ] }", "{ \"value1\": [ 4, 5 ],\"kpi\": [25, 30 ] }" ] 
}

由于value1kpi的名称以\"开头和结尾,因此Elasticsearch不会将这些字段解析为分隔数组。我想要的是这样的:

{
 "city": [ "Berlin", "Amsterdam" ],
"key1": [ "A", "B" ],
"factors": [ { "value1": [1, 2, 3 ],"kpi": [10, 15, 20 ] }, { "value1": [4, 5 ],"kpi": [25, 30 ] } ] 
}

我尝试了几种不同的正则表达式gsub组合,但我无法阻止R写反斜杠。我的最后一个选择是将对象写入文件并使用sed手动解析,但我认为应该有一种更简单的方法。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:3)

好吧,我认为应该这样做。可能是用于批量加载最终res对象的代码较少,但无论如何

library(elastic)
library(data.table)
library(jsonlite)

test <- data.table(city = c("Berlin", "Berlin", "Berlin", "Amsterdam", "Amsterdam"),
                   key1 = c("A", "A", "A", "B", "B"),
                   value1 = c(1, 2, 3, 4, 5),
                   value2 = c(0.1, 0.2, 0.3, 0.4, 0.5),
                   kpi = c(10, 15, 20, 25, 30))

res <- test[, .(factors = jsonlite::toJSON(.SD, dataframe = "columns")), 
           by = .(city, key1), 
           .SDcols = c("value1", "kpi")]

res <- lapply(apply(res, 1, as.list), function(z) {
  tt <- z[!names(z) %in% "factors"]
  tt$factors <- fromJSON(z$factors)
  tt
})

docs_bulk(res, "mycoolindex")

curl 'http://localhost:9200/mycoolindex/_search?size=1' | jq .
#> {
#>   "took": 13,
#>   "timed_out": false,
#>   "_shards": {
#>     "total": 5,
#>     "successful": 5,
#>     "failed": 0
#>   },
#>   "hits": {
#>     "total": 2,
#>     "max_score": 1,
#>     "hits": [
#>       {
#>         "_index": "mycoolindex",
#>         "_type": "mycoolindex",
#>         "_id": "AVeay0KnlE0U0vVWYXkb",
#>         "_score": 1,
#>         "_source": {
#>           "city": [
#>             "Amsterdam"
#>           ],
#>           "key1": [
#>             "B"
#>           ],
#>           "factors": {
#>             "value1": [
#>               4,
#>               5
#>             ],
#>             "kpi": [
#>               25,
#>               30
#>             ]
#>           }
#>         }
#>       }
#>     ]
#>   }
#> }