我想将json文件转换为R中的数据帧。使用以下代码:
link <- 'https://www.dropbox.com/s/ckfn1fpkcix1ccu/bevingenbag.json'
document <- fromJSON(file = link, method = 'C')
bev <- do.call("cbind", document)
我得到了这个:
type features
1 FeatureCollection list(type = "Feature", geometry = list(type = "Point", coordinates = c(6.54800000288927, 52.9920000044505)), properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751"))
这是矩阵的第一行。所有其他行具有相同的结构。我对properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751")
部分感兴趣,该部分应转换为包含gid, yymmdd, lat, lon, mag, depth, knmilocatie, baglocatie, tijd
列的数据框。
我搜索并尝试了几种解决方案,但没有一种方法有效。我使用了rjson包。我也尝试过RJSONIO&amp; jsonlite包,但无法提取所需信息。
任何人都知道如何解决这个问题?
答案 0 :(得分:4)
以下是获取数据框的方法:
library(rjson)
document <- fromJSON(file = "bevingenbag.json", method = 'C')
dat <- do.call(rbind, lapply(document$features,
function(x) data.frame(x$properties)))
编辑:如何使用NA
替换空值:
dat$baglocatie[dat$baglocatie == ""] <- NA
结果:
head(dat)
gid yymmdd lat lon mag depth knmilocatie baglocatie tijd
1 1496600 19861226 52.992 6.548 2.8 1.0 Assen Assen 74751
2 1496601 19871214 52.928 6.552 2.5 1.5 Hooghalen Hooghalen 204951
3 1496602 19891201 52.529 4.971 2.7 1.2 Purmerend Kwadijk 200914
4 1496603 19910215 52.771 6.914 2.2 3.0 Emmen Emmen 21116
5 1496604 19910425 52.952 6.575 2.6 3.0 Geelbroek Ekehaar 102631
6 1496605 19910808 52.965 6.573 2.7 3.0 Eleveld Assen 40114
答案 1 :(得分:4)
这只是另一种非常相似的方法。
@SvenHohenstein的方法在每一步创建一个数据帧,这是一个昂贵的过程。创建向量并在最后重新键入整个结果要快得多。此外,Sven的方法使每一列成为一个因素,可能是也可能不是你想要的。以下方法运行速度快约200倍。如果您打算反复执行此操作,这可能很重要。最后,您需要将列lon, lat, mag, and depth
转换为数字。
library(microbenchmark)
library(rjson)
document <- fromJSON(file = "bevingenbag.json", method = 'C')
json2df.1 <- function(json){ # @SvenHohenstein approach
df <- do.call(rbind, lapply(json$features,
function(x) data.frame(x$properties, stringsAsFactors=F)))
return(df)
}
json2df.2 <- function(json){
df <- do.call(rbind,lapply(json[["features"]],function(x){c(x$properties)}))
df <- data.frame(apply(result,2,as.character), stringsAsFactors=F)
return(df)
}
microbenchmark(x<-json2df.1(document), y<-json2df.2(document), times=10)
# Unit: milliseconds
# expr min lq median uq max neval
# x <- json2df.1(document) 2304.34378 2654.95927 2822.73224 2977.75666 3227.30996 10
# y <- json2df.2(document) 13.44385 15.27091 16.78201 18.53474 19.70797 10
identical(x,y)
# [1] TRUE