我有一个大的data.log文件,下面是几行示例。我想将其转换为EDA数据框。
{"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User
authenticated","action":"user_authenticated","username":"test@test.com"}
{"date":"2018-03-29T12:49:35.518+0000","level":"INFO","message":"User changed
password with recovery (Web)","action":"recovery_password_changed","requestSource":"WEB","username":"test123@test.com"}
我尝试从jsonlite库加载json,但出现错误,解析错误:尾随垃圾。我检查了wd,一切正常。
mydata <- fromJSON("data.log")
parse_con(txt,bigint_as_char)中的错误:解析错误:尾随 垃圾 ,“用户名”:“ test@test.com”} {“日期”:“ 2018-03-29T12:49:35.51 (就在这里)------ ^
答案 0 :(得分:0)
您在这里没有有效的json。您将需要对其进行预处理,例如
x <- '[{"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User authenticated","action":"user_authenticated","username":"test@test.com"},
{"date":"2018-03-29T12:49:35.518+0000","level":"INFO","message":"User changed password with recovery (Web)","action":"recovery_password_changed","requestSource":"WEB","username":"test123@test.com"}
]'
library(jsonlite)
fromJSON(x)
date level message
1 2018-03-29T12:49:25.308+0000 INFO User authenticated
2 2018-03-29T12:49:35.518+0000 INFO User changed password with recovery (Web)
action username requestSource
1 user_authenticated test@test.com <NA>
2 recovery_password_changed test123@test.com WEB
或每行一个条目。
> y <- '{"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User authenticated","action":"user_authenticated","username":"test@test.com"}'
> fromJSON(y)
$`date`
[1] "2018-03-29T12:49:25.308+0000"
$level
[1] "INFO"
$message
[1] "User authenticated"
$action
[1] "user_authenticated"
$username
[1] "test@test.com"
如果您在每行中都有带有{...}条目的日志文件,则可以遍历每行并将其转换为json。 mylog.txt
包含两个条目。
xy <- readLines("mylog.txt")
sapply(xy, fromJSON, USE.NAMES = FALSE)
[[1]]
[[1]]$`date`
[1] "2018-03-29T12:49:25.308+0000"
[[1]]$level
[1] "INFO"
[[1]]$message
[1] "User authenticated"
[[1]]$action
[1] "user_authenticated"
[[1]]$username
[1] "test@test.com"
[[2]]
[[2]]$`date`
[1] "2018-03-29T12:49:35.518+0000"
[[2]]$level
[1] "INFO"
[[2]]$message
[1] "User changed password with recovery (Web)"
[[2]]$action
[1] "recovery_password_changed"
[[2]]$requestSource
[1] "WEB"
[[2]]$username
[1] "test123@test.com"
或者您可以直接将其强制为data.frame。
sapply(xy, FUN = function(x) {
out <- fromJSON(x)
as.data.frame(out)
}, USE.NAMES = FALSE)
[[1]]
date level message action
1 2018-03-29T12:49:25.308+0000 INFO User authenticated user_authenticated
username
1 test@test.com
[[2]]
date level message
1 2018-03-29T12:49:35.518+0000 INFO User changed password with recovery (Web)
action requestSource username
1 recovery_password_changed WEB test123@test.com
答案 1 :(得分:0)
您可以使用ndjson::stream_in()
或jsonlite::stream_in()
。您拥有的是以换行符分隔的JSON。这些天这很普遍。