r如何使用字符串中的换行符解析漂亮打印的JSON

时间:2019-07-10 17:10:23

标签: r json parsing newline ndjson

我正在努力解析R中的JSON,该JSON在字符串内以及键/值对(和整个对象)之间都包含换行符。

这是我想要的格式:

{
    "id": 123456,
    "name": "Try to parse this",
    "description": "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
}
{
    "id": 987654,
    "name": "Have another go",
    "description": "Another two line description... \r\n With 2 lines."
}

说我已将此JSON保存为example.json。我尝试了各种技术来克服解析问题,有关SO的其他建议。没有下列作品:

library(jsonlite)

foo <- readLines("example.json")
foo <- paste(readLines("example.json"), collapse = "")

bar <- fromJSON(foo)
bar <- jsonlite::stream_in(textConnection(foo))
bar <- purrr::map(foo, jsonlite::fromJSON)
bar <- ndjson::stream_in(textConnection(foo))
bar <- read_json(textConnection(foo), format = "jsonl")

我收集到这实际上是NDJSON格式,但是没有专门的软件包可以应付。有人建议使用streaming in the data with either jsonlite or ndjsonor this onethis one)。其他人建议使用mapping the function across linesor similarly in base R)。

一切都会引发以下错误之一: Error: parse error: trailing garbageError: parse error: premature EOF或打开文本连接时出现问题。

有人可以解决吗?

1 个答案:

答案 0 :(得分:0)

修改

知道json的格式错误,我们会失去ndjson的效率,但是我认为我们可以实时修复它, 假设 ({}),后接空白或无空格(包括换行符),后接右括号({

fn <- "~/StackOverflow/TomWagstaff.json"
wrongjson <- paste(readLines(fn), collapse = "")
if (grepl("\\}\\s*\\{", wrongjson))
  wrongjson <- paste0("[", gsub("\\}\\s*\\{", "},{", wrongjson), "]")
str(json)
# List of 2
#  $ :List of 3
#   ..$ id         : int 123456
#   ..$ name       : chr "Try to parse this"
#   ..$ description: chr "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
#  $ :List of 3
#   ..$ id         : int 987654
#   ..$ name       : chr "Have another go"
#   ..$ description: chr "Another two line description... \r\n With 2 lines."

从这里开始,您可以继续

txtjson <- paste(sapply(json, jsonlite::toJSON, pretty = TRUE), collapse = "\n")

(下面是原始答案,希望/假设格式某种程度上是合法的。)


假设您的数据实际上是这样的:

{"id":123456,"name":"Try to parse this","description":"Thought reading a JSON was easy? \r\n Try parsing a newline within a string."}
{"id": 987654,"name":"Have another go","description":"Another two line description... \r\n With 2 lines."}

然后就像您怀疑ndjson一样。由此您可以执行以下操作:

fn <- "~/StackOverflow/TomWagstaff.json"
json <- jsonlite::stream_in(file(fn), simplifyDataFrame = FALSE)
# opening file input connection.
#  Imported 2 records. Simplifying...
# closing file input connection.
str(json)
# List of 2
#  $ :List of 3
#   ..$ id         : int 123456
#   ..$ name       : chr "Try to parse this"
#   ..$ description: chr "Thought reading a JSON was easy? \r\n Try parsing a newline within a string."
#  $ :List of 3
#   ..$ id         : int 987654
#   ..$ name       : chr "Have another go"
#   ..$ description: chr "Another two line description... \r\n With 2 lines."

注意,我已经 not 简化为一个框架。要在控制台上获取文字块,请

cat(sapply(json, jsonlite::toJSON, pretty = TRUE), sep = "\n")
# {
#   "id": [123456],
#   "name": ["Try to parse this"],
#   "description": ["Thought reading a JSON was easy? \r\n Try parsing a newline within a string."]
# }
# {
#   "id": [987654],
#   "name": ["Have another go"],
#   "description": ["Another two line description... \r\n With 2 lines."]
# }

如果您想以这种方式将其转储到文件中(尽管jsonlite或类似文件中的任何内容都将无法读取,因为它不再是合法的ndjson或整个文件中的合法json),那么可以

txtjson <- paste(sapply(json, jsonlite::toJSON, pretty = TRUE), collapse = "\n")

,然后使用writeLines或类似名称保存。