如何解析R中堆叠多个JSON的文件?

时间:2018-05-19 23:57:28

标签: r json jsonlite

我在R中有以下“堆叠JSON”对象,example1.json

{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
  "Code":[{"event1":"A","result":"1"},…]}
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
  "Code":[{"event1":"B","result":"1"},…]}
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
  "Code":[{"event1":"B","result":"0"},…]}

这些不是以逗号分隔的。基本目标是将某些字段(或所有字段)解析为R data.frame或data.table:

    Timestamp    Usefulness
 0   20140101      Yes
 1   20140102      No
 2   20140103      No

通常情况下,我会在R中读取JSON,如下所示:

library(jsonlite)

jsonfile = "example1.json"
foobar = fromJSON(jsonfile)

然而,这会引发解析错误:

Error: lexical error: invalid char in json text.
          [{"event1":"A","result":"1"},…]} {"ID":"1A35B","Timestamp"
                     (right here) ------^

这是与以下类似的问题,但在R:multiple Json objects in one file extract by python

编辑:此文件格式称为“换行符分隔的JSON”,NDJSON。

1 个答案:

答案 0 :(得分:2)

  1. 三个点...使您的JSON无效,因此lexical error

  2. 您可以使用jsonlite::stream_in()来流式传输' JSON行。

  3. library(jsonlite)
    
    jsonlite::stream_in(file("~/Desktop/examples1.json"))
    # opening file input connection.
    # Imported 3 records. Simplifying...
    # closing file input connection.
    #      ID Timestamp Usefulness Code
    # 1 12345  20140101        Yes A, 1
    # 2 1A35B  20140102         No B, 1
    # 3 AA356  20140103         No B, 0
    

    数据

    我已清理您的示例数据,使其成为有效的JSON并将其保存到我的桌面~/Desktop/examples1.json

    {"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes","Code":[{"event1":"A","result":"1"}]}
    {"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No","Code":[{"event1":"B","result":"1"}]}
    {"ID":"AA356","Timestamp":"20140103", "Usefulness":"No","Code":[{"event1":"B","result":"0"}]}