如何从JSON blob中提取值到R数据框中的新列?

时间:2015-09-17 18:25:04

标签: json r

我遇到了以下问题:

我的数据框有一个包含JSON对象的变量(在var2中):

  var1                                  var2
1    1 {"property1": "val1", "property2": 5}
2    2 {"property1": "val2", "property2": 8}
3    3 {"property1": "val3", "property2": 7}
4    4 {"property1": "val4", "property2": 0}
5    5 {"property1": "val5", "property3": 9}

(关于pastebin的代码here

我想在var2中提取JSON属性,并将它们添加到新列中的数据框中,如下所示:

  var1                                  var2 prop1 prop2 prop3
1    1 {"property1": "val1", "property2": 5}  val1     5    NA
2    2 {"property1": "val2", "property2": 8}  val2     8    NA
3    3 {"property1": "val3", "property2": 7}  val3     7    NA
4    4 {"property1": "val4", "property2": 0}  val4     0    NA
5    5 {"property1": "val5", "property2": 9}  val5    NA     9

在相同的序列中给出相同的属性,我发现这种方法可以使它工作:

jsonProps <- sapply(df$var2, function(x) fromJSON(x)) %>%
  t() %>%
  as.data.frame()
rownames(jsonProps) <- NULL

y <- cbind(df, jsonProps)

(如果可能的话,我很高兴收到有关如何提高效率的建议。)

时,这不再起作用了
  • 不同记录和/或
  • 的属性数量不同
  • 序列更改和/或
  • 记录之间存储不同的属性。

我对如何从我找到的属性动态创建列并且正确传输属性值感到茫然,因此欢迎您就如何解决此问题提出建议。

2 个答案:

答案 0 :(得分:3)

你可以这样做:

library(plyr)
library(jsonlite)

ll = lapply(df$var2, function(x) jsonlite::fromJSON(as.character(x)))
cbind(df, ldply(ll, data.frame))

#  var1                                  var2 property1 property3 property2
#1    a {"property1": "val1", "property3": 8}      val1         8        NA
#2    a {"property1": "val1", "property2": 5}      val1        NA         5

数据:

df = structure(list(var11 = structure(c(1L, 1L), .Label = "a", class = "factor"), 
var2 = structure(1:2, .Label = c("{\"property1\": \"val1\", \"property3\": 8}", 
"{\"property1\": \"val1\", \"property2\": 5}"), class = "factor")), .Names = c("var1", 
"var2"), class = "data.frame", row.names = 1:2)

答案 1 :(得分:0)

这并不是你想做的一切,但也许更好

library("dplyr")
library("jsonlite")

get_it <- function(x) {
  jsonlite::fromJSON(as.character(x))
}

tbl_df(test) %>%
  rowwise() %>%
  mutate(one = get_it(var2)[[1]],
         two = get_it(var2)[[2]])

Source: local data frame [5 x 4]
Groups: <by row>

   var1                                  var2   one   two
  (dbl)                                (fctr) (chr) (int)
1     1 {"property1": "val1", "property2": 5}  val1     5
2     2 {"property1": "val2", "property2": 8}  val2     8
3     3 {"property1": "val3", "property2": 7}  val3     7
4     4 {"property1": "val4", "property2": 0}  val4     0
5     5 {"property1": "val5", "property3": 9}  val5     9