我有一个保存为文本的json列表,我正在尝试将其转换为数据框。列表看起来像这样:
`{"posts": {
"data": [
{
"comments": {
"data": [
{
"created_time": "2020-01-25T16:48:03+0000",
"message": "I love all kind of art and paintings. However 19 thousand dollars for a painting is entirely too much!!. ",
"id": "1579832716452874_1373756966159579"
},
{
"created_time": "2020-01-25T15:21:29+0000",
"message": "The wind blows a piece of paper and lands next to your house. You unravel it, \"Hey hope you are having a nice day!\"...",
"id": "1579832716452874_1373704542831488"
}
]
}
}
]
}
}`
,依此类推。我希望将数据框分为以下几列:
`created_time|message|id`
及其各自的数据。 我尝试了以下命令,但没有成功,因为我得到的是完全相同的输出:
` df <- data.frame(matrix(unlist(data), ncol=length(data), byrow = FALSE))`
此外,因为数据另存为文本,所以json包(rjson,jsonlite)将不起作用。 任何建议将不胜感激。
答案 0 :(得分:0)
您似乎遇到的问题是引号。如果您将JSON对象保存为文本文件,则{jsonlite}
将在读取文件时自动转义那些字符。在可能的情况下,fromJSON
的默认值将列表展平到数据框中,这就是您想要的。
x <- jsonlite::fromJSON("complex_json.json")
x_df <- x$posts$data[1]$comments$data[[1]]
tibble::as_tibble(x_df) # tibble is for pretty-printing purposes only
# A tibble: 2 x 3 created_time message id <chr> <chr> <chr> 1 2020-01-25T16:48:0~ "I love all kind of art and paintings. However 19 thousand doll~ 1579832716452874_137~ 2 2020-01-25T15:21:2~ "The wind blows a piece of paper and lands next to your house. ~ 1579832716452874_137~
唯一棘手的问题是找出x$posts$data[1]$comments$data[[1]]
。我总是在控制台上以交互方式进行操作,偶尔检查生成的对象的str()
,以查看隐藏的嵌套层次在哪里无法清晰打印。