在R

时间:2018-05-09 11:38:57

标签: r json

我有一个csv文件,其结构如下:

输入

{"eid":"START","ver":"3.0","ets":1514764800238}}
{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}
{"eid":"IMPRESSION","ver":"3.0","ets":895732}}
{"eid":"IMPRESSION","ver":"3.0","ets":245636}}
{"eid":"INTERACT","ver":"3.0","ets":535235423525}}

正如你所看到的,它不是一个有效的json,对于上面的有效json,结构应该如下:

预期输出

[{"eid":"START","ver":"3.0","ets":1514764800238},
{"eid":"INTERACT","ver":"3.0","ets":1514764820546},
{"eid":"IMPRESSION","ver":"3.0","ets":895732},
{"eid":"IMPRESSION","ver":"3.0","ets":245636},
{"eid":"INTERACT","ver":"3.0","ets":535235423525}]

问题:

我想理想地读取文件并修复它并保存为JSON,

  1. 替换"}}"用"},"除了最后一行以外的所有地方
  2. 追加" ["和"]"在文件的开头和结尾
  3. 我尝试使用fromJSON(rjson),read_delim,但我无法阅读它。

    提前致谢

2 个答案:

答案 0 :(得分:2)

手动查找/替换是一个可重复的工作流程的可怕,可怕,可怕的建议。

一个选项 - 假设每行末尾确实有}}且文件位于/tmp/badlines

library(magrittr)
library(ndjson)

readLines("/tmp/badlines") %>%
  sub("\\}$", "", .) %>% 
  ndjson::flatten(cls = "tbl")
## # A tibble: 5 x 3
##   eid            ets ver  
##   <chr>        <dbl> <chr>
## 1 START      1.51e12 3.0  
## 2 INTERACT   1.51e12 3.0  
## 3 IMPRESSION 8.96e 5 3.0  
## 4 IMPRESSION 2.46e 5 3.0  
## 5 INTERACT   5.35e11 3.0  

答案 1 :(得分:0)

请注意,此问题几乎与Extraction of different types of variables from a large list

重复

除了读取并运行fromJSON(jsonlite包)之外,一行基本代码可以将其转换为有效的JSON(在变量json中)。

  • 使用"}}"在每个输入行上使用"}"替换sub
  • 使用toString
  • 在行之间插入逗号
  • 使用"["
  • 围绕"]"c

代码:

library(jsonlite)

L <- readLines("test.json")
json <- c("[", toString(sub("}}", "}", L)), "]")
fromJSON(json)

,并提供:

         eid ver          ets
1      START 3.0 1.514765e+12
2   INTERACT 3.0 1.514765e+12
3 IMPRESSION 3.0 8.957320e+05
4 IMPRESSION 3.0 2.456360e+05
5   INTERACT 3.0 5.352354e+11

变异

这可以表示为提供相同输出的管道:

library(jsonlite)
library(magrittr)

"test.json" %>%
  sub("}}", "}", .) %>%
  toString %>%
  c("[", ., "]") %>%
  fromJSON

注意

使用以下代码生成测试输入:

Lines <- c('{"eid":"START","ver":"3.0","ets":1514764800238}}',
'{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":895732}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":245636}}',
'{"eid":"INTERACT","ver":"3.0","ets":535235423525}}')

writeLines(Lines, "test.json")