我有data.table
喜欢这个:
test <- data.table(city = c("Berlin", "Berlin", "Berlin", "Amsterdam", "Amsterdam"),
key1 = c("A", "A", "A", "B", "B"),
value1 = c(1, 2, 3, 4, 5),
value2 = c(0.1, 0.2, 0.3, 0.4, 0.5),
kpi = c(10, 15, 20, 25, 30))
我想将这些数据上传到Elasticsearch,但具有特定的结构:
library(RJSONIO)
res <-test[, .(factors = toJSON(.SD)),
by = .(city, key1),
.SDcols = c("value1", "kpi")]
此代码在列factors
中创建不同的JSON。由于我想摆脱库引入的\n
序列,我可以在赋值中替换这些字符串:
res <-test[, .(factors = gsub("\n", "", toJSON(.SD))),
by = .(city, key1),
.SDcols = c("value1", "kpi")]
当我想将此对象上传到Elasticsearch(我使用elastic
包)时出现问题。由于R使用反斜杠来转义字符串中的双引号,因此当我使用:
docs_bulk(res, "index")
它在使用内部\"
("
和toJSON
)创建的字符串字段中写入value1
而不是kpi
。将对象写入文件时也可以检查:
write(toJSON(res), "~/output.json")
{
"city": [ "Berlin", "Amsterdam" ],
"key1": [ "A", "B" ],
"factors": [ "{ \"value1\": [1, 2, 3 ],\"kpi\": [10, 15, 20 ] }", "{ \"value1\": [ 4, 5 ],\"kpi\": [25, 30 ] }" ]
}
由于value1
和kpi
的名称以\"
开头和结尾,因此Elasticsearch不会将这些字段解析为分隔数组。我想要的是这样的:
{
"city": [ "Berlin", "Amsterdam" ],
"key1": [ "A", "B" ],
"factors": [ { "value1": [1, 2, 3 ],"kpi": [10, 15, 20 ] }, { "value1": [4, 5 ],"kpi": [25, 30 ] } ]
}
我尝试了几种不同的正则表达式gsub
组合,但我无法阻止R写反斜杠。我的最后一个选择是将对象写入文件并使用sed
手动解析,但我认为应该有一种更简单的方法。任何帮助将不胜感激。
答案 0 :(得分:3)
res
对象的代码较少,但无论如何
library(elastic)
library(data.table)
library(jsonlite)
test <- data.table(city = c("Berlin", "Berlin", "Berlin", "Amsterdam", "Amsterdam"),
key1 = c("A", "A", "A", "B", "B"),
value1 = c(1, 2, 3, 4, 5),
value2 = c(0.1, 0.2, 0.3, 0.4, 0.5),
kpi = c(10, 15, 20, 25, 30))
res <- test[, .(factors = jsonlite::toJSON(.SD, dataframe = "columns")),
by = .(city, key1),
.SDcols = c("value1", "kpi")]
res <- lapply(apply(res, 1, as.list), function(z) {
tt <- z[!names(z) %in% "factors"]
tt$factors <- fromJSON(z$factors)
tt
})
docs_bulk(res, "mycoolindex")
curl 'http://localhost:9200/mycoolindex/_search?size=1' | jq .
#> {
#> "took": 13,
#> "timed_out": false,
#> "_shards": {
#> "total": 5,
#> "successful": 5,
#> "failed": 0
#> },
#> "hits": {
#> "total": 2,
#> "max_score": 1,
#> "hits": [
#> {
#> "_index": "mycoolindex",
#> "_type": "mycoolindex",
#> "_id": "AVeay0KnlE0U0vVWYXkb",
#> "_score": 1,
#> "_source": {
#> "city": [
#> "Amsterdam"
#> ],
#> "key1": [
#> "B"
#> ],
#> "factors": {
#> "value1": [
#> 4,
#> 5
#> ],
#> "kpi": [
#> 25,
#> 30
#> ]
#> }
#> }
#> }
#> ]
#> }
#> }