我正在努力解析R中的JSON,该JSON在字符串内以及键/值对(和整个对象)之间都包含换行符。
这是我想要的格式:
{
"id": 123456,
"name": "Try to parse this",
"description": "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
}
{
"id": 987654,
"name": "Have another go",
"description": "Another two line description... \r\n With 2 lines."
}
说我已将此JSON保存为example.json
。我尝试了各种技术来克服解析问题,有关SO的其他建议。没有下列作品:
library(jsonlite)
foo <- readLines("example.json")
foo <- paste(readLines("example.json"), collapse = "")
bar <- fromJSON(foo)
bar <- jsonlite::stream_in(textConnection(foo))
bar <- purrr::map(foo, jsonlite::fromJSON)
bar <- ndjson::stream_in(textConnection(foo))
bar <- read_json(textConnection(foo), format = "jsonl")
我收集到这实际上是NDJSON格式,但是没有专门的软件包可以应付。有人建议使用streaming in the data with either jsonlite or ndjson(or this one和this one)。其他人建议使用mapping the function across lines(or similarly in base R)。
一切都会引发以下错误之一:
Error: parse error: trailing garbage
或Error: parse error: premature EOF
或打开文本连接时出现问题。
有人可以解决吗?
答案 0 :(得分:0)
修改
知道json的格式错误,我们会失去ndjson的效率,但是我认为我们可以实时修复它, 假设 ({}
),后接空白或无空格(包括换行符),后接右括号({
)
fn <- "~/StackOverflow/TomWagstaff.json"
wrongjson <- paste(readLines(fn), collapse = "")
if (grepl("\\}\\s*\\{", wrongjson))
wrongjson <- paste0("[", gsub("\\}\\s*\\{", "},{", wrongjson), "]")
str(json)
# List of 2
# $ :List of 3
# ..$ id : int 123456
# ..$ name : chr "Try to parse this"
# ..$ description: chr "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
# $ :List of 3
# ..$ id : int 987654
# ..$ name : chr "Have another go"
# ..$ description: chr "Another two line description... \r\n With 2 lines."
从这里开始,您可以继续
txtjson <- paste(sapply(json, jsonlite::toJSON, pretty = TRUE), collapse = "\n")
(下面是原始答案,希望/假设格式某种程度上是合法的。)
假设您的数据实际上是这样的:
{"id":123456,"name":"Try to parse this","description":"Thought reading a JSON was easy? \r\n Try parsing a newline within a string."}
{"id": 987654,"name":"Have another go","description":"Another two line description... \r\n With 2 lines."}
然后就像您怀疑ndjson一样。由此您可以执行以下操作:
fn <- "~/StackOverflow/TomWagstaff.json"
json <- jsonlite::stream_in(file(fn), simplifyDataFrame = FALSE)
# opening file input connection.
# Imported 2 records. Simplifying...
# closing file input connection.
str(json)
# List of 2
# $ :List of 3
# ..$ id : int 123456
# ..$ name : chr "Try to parse this"
# ..$ description: chr "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
# $ :List of 3
# ..$ id : int 987654
# ..$ name : chr "Have another go"
# ..$ description: chr "Another two line description... \r\n With 2 lines."
注意,我已经 not 简化为一个框架。要在控制台上获取文字块,请
cat(sapply(json, jsonlite::toJSON, pretty = TRUE), sep = "\n")
# {
# "id": [123456],
# "name": ["Try to parse this"],
# "description": ["Thought reading a JSON was easy? \r\n Try parsing a newline within a string."]
# }
# {
# "id": [987654],
# "name": ["Have another go"],
# "description": ["Another two line description... \r\n With 2 lines."]
# }
如果您想以这种方式将其转储到文件中(尽管jsonlite
或类似文件中的任何内容都将无法读取,因为它不再是合法的ndjson或整个文件中的合法json),那么可以
txtjson <- paste(sapply(json, jsonlite::toJSON, pretty = TRUE), collapse = "\n")
,然后使用writeLines
或类似名称保存。