在将json文件转换为dataframe方面,我是R面临的新问题。我有如下所示的json文件:
json_file = '[{"id": "abc", "model": "honda", "date": "20190604", "cols": {"action": 15, "values": 18, "not": 29}},
{"id": "abc", "model": "honda", "date": "20190604", "cols": {"hello": 14, "hi": 85, "wow": 14}},
{"id": "mno", "model": "ford", "date": "20190604", "cols": {"yesterday": 21, "today": 21, "tomorrow": 29}},
{"id": "mno", "model": "ford", "date": "20190604", "cols": {"docs": 25, "ok": 87, "none": 42}}]'
我想将上述json文件转换为以下格式的数据帧:
预期结果
df =
id model date cols values_cols
abc honda 20190604 action 15
abc honda 20190604 values 18
abc honda 20190604 not 29
abc honda 20190604 hello 14
abc honda 20190604 hi 85
abc honda 20190604 wow 14
mno ford 20190604 yesterday 21
mno ford 20190604 today 21
mno ford 20190604 tomorrow 29
mno ford 20190604 docs 25
mno ford 20190604 ok 87
我的结果
id model date cols id.1 model.1 date.1 cols.1 id.2 model.2 date.2 cols.2 id.3 model.3 date.3 cols.3
action abc honda 20190604 15 abc honda 20190604 14 mno ford 20190604 21 mno ford 20190604 25
values abc honda 20190604 18 abc honda 20190604 85 mno ford 20190604 21 mno ford 20190604 87
not abc honda 20190604 29 abc honda 20190604 14 mno ford 20190604 29 mno ford 20190604 42
It's not correct, as it is taking as index.
我的解决方案:
require(RJSONIO)
df = fromJSON(json_file)
答案 0 :(得分:0)
使用jsonlite::fromJSON
读取数据时的问题是最后一列是数据帧,而不是原子向量。
tmp <- jsonlite::fromJSON(json_file)
str(tmp)
#'data.frame': 4 obs. of 4 variables:
# $ id : chr "abc" "abc" "mno" "mno"
# $ model: chr "honda" "honda" "ford" "ford"
# $ date : chr "20190604" "20190604" "20190604" "20190604"
# $ cols :'data.frame': 4 obs. of 12 variables:
# ..$ action : int 15 NA NA NA
# ..$ values : int 18 NA NA NA
# ..$ not : int 29 NA NA NA
# ..$ hello : int NA 14 NA NA
# ..$ hi : int NA 85 NA NA
# ..$ wow : int NA 14 NA NA
# ..$ yesterday: int NA NA 21 NA
# ..$ today : int NA NA 21 NA
# ..$ tomorrow : int NA NA 29 NA
# ..$ docs : int NA NA NA 25
# ..$ ok : int NA NA NA 87
# ..$ none : int NA NA NA 42
因此,最后一列必须与reshaping the data from wide format to long format之前的其他三列cbind
在一起。
tmp <- cbind(tmp[-4], tmp[[4]])
df1 <- reshape2::melt(tmp, id.vars = c("id", "model", "date"))
names(df1)[4:5] <- c("cols", "values_cols")
df1 <- df1[complete.cases(df1), ]
row.names(df1) <- NULL
df1
# id model date cols values_cols
#1 abc honda 20190604 action 15
#2 abc honda 20190604 values 18
#3 abc honda 20190604 not 29
#4 abc honda 20190604 hello 14
#5 abc honda 20190604 hi 85
#6 abc honda 20190604 wow 14
#7 mno ford 20190604 yesterday 21
#8 mno ford 20190604 today 21
#9 mno ford 20190604 tomorrow 29
#10 mno ford 20190604 docs 25
#11 mno ford 20190604 ok 87
#12 mno ford 20190604 none 42
现在清理.GlobalEnv
。
rm(tmp) # no longer needed.