我有一个csv文件,其结构如下:
输入
{"eid":"START","ver":"3.0","ets":1514764800238}}
{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}
{"eid":"IMPRESSION","ver":"3.0","ets":895732}}
{"eid":"IMPRESSION","ver":"3.0","ets":245636}}
{"eid":"INTERACT","ver":"3.0","ets":535235423525}}
正如你所看到的,它不是一个有效的json,对于上面的有效json,结构应该如下:
预期输出
[{"eid":"START","ver":"3.0","ets":1514764800238},
{"eid":"INTERACT","ver":"3.0","ets":1514764820546},
{"eid":"IMPRESSION","ver":"3.0","ets":895732},
{"eid":"IMPRESSION","ver":"3.0","ets":245636},
{"eid":"INTERACT","ver":"3.0","ets":535235423525}]
问题:
我想理想地读取文件并修复它并保存为JSON,
我尝试使用fromJSON(rjson),read_delim,但我无法阅读它。
提前致谢
答案 0 :(得分:2)
手动查找/替换是一个可重复的工作流程的可怕,可怕,可怕的建议。
一个选项 - 假设每行末尾确实有}}
且文件位于/tmp/badlines
:
library(magrittr)
library(ndjson)
readLines("/tmp/badlines") %>%
sub("\\}$", "", .) %>%
ndjson::flatten(cls = "tbl")
## # A tibble: 5 x 3
## eid ets ver
## <chr> <dbl> <chr>
## 1 START 1.51e12 3.0
## 2 INTERACT 1.51e12 3.0
## 3 IMPRESSION 8.96e 5 3.0
## 4 IMPRESSION 2.46e 5 3.0
## 5 INTERACT 5.35e11 3.0
答案 1 :(得分:0)
请注意,此问题几乎与Extraction of different types of variables from a large list
重复除了读取并运行fromJSON
(jsonlite包)之外,一行基本代码可以将其转换为有效的JSON(在变量json
中)。
"}}"
在每个输入行上使用"}"
替换sub
,toString
和"["
"]"
和c
代码:
library(jsonlite)
L <- readLines("test.json")
json <- c("[", toString(sub("}}", "}", L)), "]")
fromJSON(json)
,并提供:
eid ver ets
1 START 3.0 1.514765e+12
2 INTERACT 3.0 1.514765e+12
3 IMPRESSION 3.0 8.957320e+05
4 IMPRESSION 3.0 2.456360e+05
5 INTERACT 3.0 5.352354e+11
这可以表示为提供相同输出的管道:
library(jsonlite)
library(magrittr)
"test.json" %>%
sub("}}", "}", .) %>%
toString %>%
c("[", ., "]") %>%
fromJSON
使用以下代码生成测试输入:
Lines <- c('{"eid":"START","ver":"3.0","ets":1514764800238}}',
'{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":895732}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":245636}}',
'{"eid":"INTERACT","ver":"3.0","ets":535235423525}}')
writeLines(Lines, "test.json")