我从elasticsearch中提取数据如下:
> packageVersion("elastic") [1] '0.7.8'
# data extract
body <- list(query=list(range=list(timestamp=list(gte="2016-10-13", lte="2016-10-15"))))
b3 <- Search(index="myIndex",
sort=c("timestamp:desc"),
fields=c('timestamp','A','B','C','D','E','F','G'),
body=body,
size=3)
提取第一个和第二个元素确定(编辑以节省空间):
$ $命中次数[[1]] $ $领域F,E,B,G,C,A,d,时间戳
$ hits $ hits [[2]] $ fields $ F,E,B,G,C,A,D,timestamp
第三个元素未完全提取为:
$ hits $ hits [[3]] $ fields $ C,A,B,D,timestamp
==
我按照这篇文章将列表转换为数据框:
Convert in R output of package Elastic (nested list?) to data.frame or JSON
第一个和第二个元素完美加载
第三个元素加载不正确,因为没有提取完整元素,导致以下错误:
# (optional) verify that all hits expand to the same length
# (should be true for data intended to be in a table format)
stopifnot(
sapply(
b3$hits$hits,
function(x) {!(length(unlist(x)) - length(unlist(b3$hits$hits[[1]])))}
)
)
Error: sapply(b3$hits$hits, function(x) { .... are not all TRUE
# load into the dataframe
# count number of columns, use unlist() to convert
# nested lists to a vector, use the first hit as proxy
nColumns <- length(unlist(b3$hits$hits[[1]]))
# fetch column names ... as above
nNames <- names(unlist(b3$hits$hits[[1]]))
# unlist all hits and convert to matrix with ncol Columns, don't forget byrow=TRUE!
df.b3 <- data.frame(matrix(unlist(b3$hits$hits), ncol=nColumns, byrow=TRUE))
Warning message:
In matrix(unlist(b3$hits$hits), ncol = nColumns, byrow = TRUE) :
data length [33] is not a sub-multiple or multiple of the number of columns [12]
>
注意:变量D,E,F,G中的某些记录包含空(NULL)和' - '值。我怀疑这可能会导致提取问题。
如果你们中的任何人遇到类似问题并找到解决方案,我会喜欢一些反馈 非常感谢。
答案 0 :(得分:1)
此处作者elastic
我们不会尝试将输出强制转换为data.frame,因为它可能变化很大,以至于我们经常会遇到错误。但是我们允许您将选项传递给jsonlite
以强制转移到data.frame
(通过asdf
参数,作为data.frame ),因为它应该&永远都不会失败。
如果处理列表输出,如果返回列表,我会使用dplyr
或data.table
之一。
重现性:
library(elastic)
if (!index_exists("shakespeare")) {
shakespeare <- system.file("examples", "shakespeare_data.json", package = "elastic")
docs_bulk(shakespeare)
}
res <- Search(index="shakespeare", fields=c('play_name','speaker'))
out <- lapply(res$hits$hits, function(x) unlist(x$fields, FALSE))
dplyr
library(dplyr)
bind_rows(out)
#> # A tibble: 10 × 2
#> play_name speaker
#> <chr> <chr>
#> 1 Henry IV
#> 2 Henry IV KING HENRY IV
#> 3 Henry IV KING HENRY IV
#> 4 Henry IV KING HENRY IV
#> 5 Henry IV KING HENRY IV
#> 6 Henry IV KING HENRY IV
#> 7 Henry IV KING HENRY IV
#> 8 Henry IV KING HENRY IV
#> 9 Henry IV WESTMORELAND
#> 10 Henry IV WESTMORELAND
data.table
library(data.table)
rbindlist(out, fill = TRUE, use.names = TRUE)
#> play_name speaker
#> 1: Henry IV
#> 2: Henry IV KING HENRY IV
#> 3: Henry IV KING HENRY IV
#> 4: Henry IV KING HENRY IV
#> 5: Henry IV KING HENRY IV
#> 6: Henry IV KING HENRY IV
#> 7: Henry IV KING HENRY IV
#> 8: Henry IV KING HENRY IV
#> 9: Henry IV WESTMORELAND
#> 10: Henry IV WESTMORELAND
或者,使用asdf
参数,如果可能,它会在内部指示jsonlite::fromJSON
解析为data.frame。
res <- Search(index="shakespeare", fields=c('play_name','speaker'), asdf = TRUE)
res$hits$hits$fields
#> play_name speaker
#> 1 Henry IV
#> 2 Henry IV KING HENRY IV
#> 3 Henry IV KING HENRY IV
#> 4 Henry IV KING HENRY IV
#> 5 Henry IV KING HENRY IV
#> 6 Henry IV KING HENRY IV
#> 7 Henry IV KING HENRY IV
#> 8 Henry IV KING HENRY IV
#> 9 Henry IV WESTMORELAND
#> 10 Henry IV WESTMORELAND
使用:
v3.3.2
elastic
v0.7.8.9000
Elasticsearch
v2.3.4