我正在尝试将数据从JSON文件导入到R中,以便尝试自然语言处理。这些数据是在用markdown编写的博客中解析和提取的。问题是R中的导入是作为列表和有趣的格式导入的,我无法弄清楚如何将其导入数据框。这是我的JSON文件或导入过程的问题吗?
示例数据:
{
"2017-11-17-blog-post-01": {
"title": "Blog Post 01",
"layout": "post",
"categories": [
"Category1",
"Category2"
],
"comments": true,
"published": true,
"permalink": "/blog-post-01.html",
"basename": "2017-11-17-blog-post-01"
},
"2017-11-30-blog-post-02": {
"title": "Blog Post 2",
"layout": "post",
"categories": [
"Category2",
"Category3"
],
"comments": true,
"published": true,
"permalink": "/2017-11-30-blog-post-02.html",
"basename": "2017-11-30-blog-post-02"
}
}
命令:
library(jsonlite)
import <- fromJSON("test-import.json", flatten=TRUE)
结果:
$`2017-11-17-blog-post-01`
$`2017-11-17-blog-post-01`$title
[1] "Blog Post 01"
$`2017-11-17-blog-post-01`$layout
[1] "post"
$`2017-11-17-blog-post-01`$categories
[1] "Category1" "Category2"
$`2017-11-17-blog-post-01`$comments
[1] TRUE
$`2017-11-17-blog-post-01`$published
[1] TRUE
$`2017-11-17-blog-post-01`$permalink
[1] "/blog-post-01.html"
$`2017-11-17-blog-post-01`$basename
[1] "2017-11-17-blog-post-01"
$`2017-11-30-blog-post-02`
$`2017-11-30-blog-post-02`$title
[1] "Blog Post 2"
$`2017-11-30-blog-post-02`$layout
[1] "post"
$`2017-11-30-blog-post-02`$categories
[1] "Category2" "Category3"
$`2017-11-30-blog-post-02`$comments
[1] TRUE
$`2017-11-30-blog-post-02`$published
[1] TRUE
$`2017-11-30-blog-post-02`$permalink
[1] "/2017-11-30-blog-post-02.html"
$`2017-11-30-blog-post-02`$basename
[1] "2017-11-30-blog-post-02"
答案 0 :(得分:1)
library(purrr)
您的数据:
jsonlite::fromJSON('{
"2017-11-17-blog-post-01": {
"title": "Blog Post 01",
"layout": "post",
"categories": [
"Category1",
"Category2"
],
"comments": true,
"published": true,
"permalink": "/blog-post-01.html",
"basename": "2017-11-17-blog-post-01"
},
"2017-11-30-blog-post-02": {
"title": "Blog Post 2",
"layout": "post",
"categories": [
"Category2",
"Category3"
],
"comments": true,
"published": true,
"permalink": "/2017-11-30-blog-post-02.html",
"basename": "2017-11-30-blog-post-02"
}
}', flatten=TRUE) -> jsdat
flatten=TRUE
大部分时间都有用,但我认为categories
会导致它不能自动为您制作数据框,所以我们可以帮忙:
map_df(jsdat, ~{
.x$categories <- list(.x$categories)
.x
}, .id="id")
## # A tibble: 2 x 8
## id title layout categories comments published permalink basename
## <chr> <chr> <chr> <list> <lgl> <lgl> <chr> <chr>
## 1 2017-11-17-blog-post-01 Blog Post 01 post <chr [2]> TRUE TRUE /blog-post-01.html 2017-11-17-blog-post-01
## 2 2017-11-30-blog-post-02 Blog Post 2 post <chr [2]> TRUE TRUE /2017-11-30-blog-post-02.html 2017-11-30-blog-post-02