将JSON文件转换为R中的数据帧

时间:2019-06-29 18:14:28

标签: r json dataframe

在将json文件转换为dataframe方面,我是R面临的新问题。我有如下所示的json文件:

json_file = '[{"id": "abc", "model": "honda", "date": "20190604", "cols": {"action": 15, "values": 18, "not": 29}},
  {"id": "abc", "model": "honda", "date": "20190604", "cols": {"hello": 14, "hi": 85, "wow": 14}},
  {"id": "mno", "model": "ford", "date": "20190604", "cols": {"yesterday": 21, "today": 21, "tomorrow": 29}},
  {"id": "mno", "model": "ford", "date": "20190604", "cols": {"docs": 25, "ok": 87, "none": 42}}]'

我想将上述json文件转换为以下格式的数据帧:

预期结果

df = 
id  model      date  cols  values_cols
abc honda  20190604 action   15   
abc honda  20190604 values   18 
abc honda  20190604 not      29 
abc honda  20190604 hello    14 
abc honda  20190604 hi       85 
abc honda  20190604 wow      14 
mno ford  20190604 yesterday 21   
mno ford  20190604 today     21 
mno ford  20190604 tomorrow  29 
mno ford  20190604 docs      25 
mno ford  20190604 ok        87 

我的结果

    id model     date cols id.1 model.1   date.1 cols.1 id.2 model.2   date.2 cols.2 id.3 model.3   date.3 cols.3
action abc honda 20190604   15  abc   honda 20190604     14  mno    ford 20190604     21  mno    ford 20190604     25
values abc honda 20190604   18  abc   honda 20190604     85  mno    ford 20190604     21  mno    ford 20190604     87
not    abc honda 20190604   29  abc   honda 20190604     14  mno    ford 20190604     29  mno    ford 20190604     42
It's not correct, as it is taking as index.

我的解决方案:

require(RJSONIO)
df = fromJSON(json_file)

1 个答案:

答案 0 :(得分:0)

使用jsonlite::fromJSON读取数据时的问题是最后一列是数据帧,而不是原子向量。

tmp <- jsonlite::fromJSON(json_file)
str(tmp)
#'data.frame':   4 obs. of  4 variables:
# $ id   : chr  "abc" "abc" "mno" "mno"
# $ model: chr  "honda" "honda" "ford" "ford"
# $ date : chr  "20190604" "20190604" "20190604" "20190604"
# $ cols :'data.frame':  4 obs. of  12 variables:
#  ..$ action   : int  15 NA NA NA
#  ..$ values   : int  18 NA NA NA
#  ..$ not      : int  29 NA NA NA
#  ..$ hello    : int  NA 14 NA NA
#  ..$ hi       : int  NA 85 NA NA
#  ..$ wow      : int  NA 14 NA NA
#  ..$ yesterday: int  NA NA 21 NA
#  ..$ today    : int  NA NA 21 NA
#  ..$ tomorrow : int  NA NA 29 NA
#  ..$ docs     : int  NA NA NA 25
#  ..$ ok       : int  NA NA NA 87
#  ..$ none     : int  NA NA NA 42

因此,最后一列必须与reshaping the data from wide format to long format之前的其他三列cbind在一起。

tmp <- cbind(tmp[-4], tmp[[4]])
df1 <- reshape2::melt(tmp, id.vars = c("id", "model", "date"))
names(df1)[4:5] <- c("cols", "values_cols")
df1 <- df1[complete.cases(df1), ]
row.names(df1) <- NULL

df1
#    id model     date      cols values_cols
#1  abc honda 20190604    action          15
#2  abc honda 20190604    values          18
#3  abc honda 20190604       not          29
#4  abc honda 20190604     hello          14
#5  abc honda 20190604        hi          85
#6  abc honda 20190604       wow          14
#7  mno  ford 20190604 yesterday          21
#8  mno  ford 20190604     today          21
#9  mno  ford 20190604  tomorrow          29
#10 mno  ford 20190604      docs          25
#11 mno  ford 20190604        ok          87
#12 mno  ford 20190604      none          42

现在清理.GlobalEnv

rm(tmp)    # no longer needed.