我对R很新。试着整天解决一个问题。不幸的是我无法解决它。
我想在R中导入JSON文件,然后有机会以与导入CSV文件时相同的方式进一步处理它。
我的JSON文件必须遵循以下结构:
{ "reviewerID": "A2SUAM1J3GNN3B",
"asin": "0000013714",
"reviewerName": "J. McDonald",
"helpful": [2, 3],
"reviewText": "I bought this for my husband who plays the piano.
He is having a wonderful time playing these old hymns. The music is at
times hard to read because we think the book was published for singing
from more than playing from. Great purchase though!",
"overall": 5.0,
"summary": "Heavenly Highway Hymns",
"unixReviewTime": 1252800000,
"reviewTime": "09 13, 2009"
}
我想导入JSON文件,然后有一个包含9列的表(reviewerID,asin,reviewerName等)。
我尝试使用R包jsonlite,但如果我这样做,我会收到以下错误消息:
data <- fromJSON('reviews_Office_Products.json.gz2')
Error in feed_push_parser(buf) : parse error: trailing garbage
"reviewTime": "07 19, 2013"} {"reviewerID": "A3BBNK2R5TUYGV"
(right here) ------^
你知道我能完成任务吗?
非常感谢你。
祝你好运 保罗
答案 0 :(得分:1)
最后我做了如下:
library(rjson)
url <- "reviews_Office_Products.json.gz2"
con = file(url, "r")
input <- readLines(con, -1L)
my_results <- lapply(X=input,fromJSON)
close(con)
tr.review <- ldply(lapply(input, function(x) t(unlist(fromJSON(x)))))
save(tr.review, file= 'tr.review.rdata')
就我的目的而言,这是有效的,我可以使用tm-package进一步处理数据。
非常感谢你的帮助。 保罗
答案 1 :(得分:0)
这很有效。您可能需要使用正则表达式来使其适合。请注意,R regex中需要使用double而不是单个反斜杠。
library(rjson)
library(magrittr)
library(dplyr)
library(lubridate)
library(stringi)
options(stringsAsFactors = FALSE)
'{ "reviews": [ { "reviewerID": "A2SUAM1J3GNN3B",
"asin": "0000013714",
"reviewerName": "J. McDonald",
"helpful": [2, 3],
"reviewText": "I bought this for my husband who plays the piano.
He is having a wonderful time playing these old hymns. The music is at
times hard to read because we think the book was published for singing
from more than playing from. Great purchase though!",
"overall": 5.0,
"summary": "Heavenly Highway Hymns",
"unixReviewTime": 1252800000,
"reviewTime": "09 13, 2009"
} { "reviewerID": "A2SUAM1J3GNN3B",
"asin": "0000013714",
"reviewerName": "J. McDonald",
"helpful": [2, 3],
"reviewText": "I bought this for my husband who plays the piano.
He is having a wonderful time playing these old hymns. The music is at
times hard to read because we think the book was published for singing
from more than playing from. Great purchase though!",
"overall": 5.0,
"summary": "Heavenly Highway Hymns",
"unixReviewTime": 1252800000,
"reviewTime": "09 13, 2009"
} ] }' %>%
writeLines("reviews_Office_Products.json.gz2")
data =
"reviews_Office_Products.json.gz2" %>%
readLines %>%
stri_replace_all_regex("\\}[ \\n]*\\{", "},{") %>%
paste(collapse = "\n") %>%
fromJSON %>%
.[[1]] %>%
lapply(as.data.frame) %>%
bind_rows %>%
select(-unixReviewTime) %>%
mutate(asin = as.numeric(asin),
reviewTime = mdy(reviewTime) )
review =
data %>%
select(-helpful) %>%
distinct
review__helpful =
data %>%
select(reviewerID, helpful) %>%
distinct