在R中加载.log文件

时间:2018-08-19 19:04:57

标签: r json jsonlite

我有一个大的data.log文件,下面是几行示例。我想将其转换为EDA数据框。

{"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User 
authenticated","action":"user_authenticated","username":"test@test.com"}
{"date":"2018-03-29T12:49:35.518+0000","level":"INFO","message":"User changed 
password with recovery (Web)","action":"recovery_password_changed","requestSource":"WEB","username":"test123@test.com"}

我尝试从jsonlite库加载json,但出现错误,解析错误:尾随垃圾。我检查了wd,一切正常。

mydata <- fromJSON("data.log")
  

parse_con(txt,bigint_as_char)中的错误:解析错误:尾随   垃圾         ,“用户名”:“ test@test.com”} {“日期”:“ 2018-03-29T12:49:35.51                    (就在这里)------ ^

2 个答案:

答案 0 :(得分:0)

您在这里没有有效的json。您将需要对其进行预处理,例如

x <- '[{"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User authenticated","action":"user_authenticated","username":"test@test.com"},
{"date":"2018-03-29T12:49:35.518+0000","level":"INFO","message":"User changed password with recovery (Web)","action":"recovery_password_changed","requestSource":"WEB","username":"test123@test.com"}
]'

library(jsonlite)

fromJSON(x)

                          date level                                   message
1 2018-03-29T12:49:25.308+0000  INFO                        User authenticated
2 2018-03-29T12:49:35.518+0000  INFO User changed password with recovery (Web)
                     action         username requestSource
1        user_authenticated    test@test.com          <NA>
2 recovery_password_changed test123@test.com           WEB

或每行一个条目。

> y <- '{"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User authenticated","action":"user_authenticated","username":"test@test.com"}'
> fromJSON(y)
$`date`
[1] "2018-03-29T12:49:25.308+0000"

$level
[1] "INFO"

$message
[1] "User authenticated"

$action
[1] "user_authenticated"

$username
[1] "test@test.com"

如果您在每行中都有带有{...}条目的日志文件,则可以遍历每行并将其转换为json。 mylog.txt包含两个条目。

xy <- readLines("mylog.txt")
sapply(xy, fromJSON, USE.NAMES = FALSE)

[[1]]
[[1]]$`date`
[1] "2018-03-29T12:49:25.308+0000"

[[1]]$level
[1] "INFO"

[[1]]$message
[1] "User authenticated"

[[1]]$action
[1] "user_authenticated"

[[1]]$username
[1] "test@test.com"


[[2]]
[[2]]$`date`
[1] "2018-03-29T12:49:35.518+0000"

[[2]]$level
[1] "INFO"

[[2]]$message
[1] "User changed password with recovery (Web)"

[[2]]$action
[1] "recovery_password_changed"

[[2]]$requestSource
[1] "WEB"

[[2]]$username
[1] "test123@test.com"

或者您可以直接将其强制为data.frame。

sapply(xy, FUN = function(x) {
  out <- fromJSON(x)
  as.data.frame(out)
}, USE.NAMES = FALSE)

[[1]]
                          date level            message             action
1 2018-03-29T12:49:25.308+0000  INFO User authenticated user_authenticated
       username
1 test@test.com

[[2]]
                          date level                                   message
1 2018-03-29T12:49:35.518+0000  INFO User changed password with recovery (Web)
                     action requestSource         username
1 recovery_password_changed           WEB test123@test.com

答案 1 :(得分:0)

您可以使用ndjson::stream_in()jsonlite::stream_in()。您拥有的是以换行符分隔的JSON。这些天这很普遍。