我正在尝试使用pig解析一堆日志数据。不幸的是,一个命令的数据分布在多行(审计日志)中。我知道有一个id与所有日志消息相关联,并且有不同的类型包含整个部分,但我不确定如何将它们全部收集到一条消息中。
我根据类型拆分消息,然后根据id加入,但由于SYSCALL和PATH之间存在一对多关系,因此不会收集一行上的所有信息。我可以通过id进行分组,但是我希望能够从每个PATH元组中提取相同的字段(名称),但我不知道这样做。
我应该编写自己的UDF吗? FOREACH不跟踪状态,以便我可以连接每个元组的名称字段。
编辑添加示例:
{“message”:“1月6日15:30:11 r01sv06 auditd:node = r01sv06 type = SYSCALL msg = audit(1389047402.069:4455727):arch = c000003e syscall = 59 success = yes exit = 0 a0 = 7fff8ef30600 a1 = 7fff8ef30630 a2 = 270f950 a3 = fffffffffffffff0 items = 2 ppid = 1493 pid = 1685 auid = 0 uid = 0 gid = 0 euid = 0 suid = 0 fsuid = 0 egid = 0 sgid = 0 fsgid = 0 tty =(none)ses = 8917 comm = \“ip \”exe = \“/ sbin / ip \” 键= \ “命令\” “ ”@时间戳“: ”2014-01-06T22:30:14.642Z“, ”@版本“: ”1“, ”类型“: ”审核“, ”宿主“:” r01sv09a “ ”路径“: ”/数据/日志/ audit.log“, ”syslog_timestamp“:” 一月 6 15:30:11“,”syslog_program“:”auditd“,”received_at“:”2014-01-06 22:30:14 UTC“,”received_from“:”r01sv06“,”syslog_severity_code“:5 “syslog_facility_code”:1 “SYSLOG_FACILITY”: “用户级”, “syslog_severity”: “通知”, “@ source_host”: “r01sv06”}
{“message”:“1月6日15:30:11 r01sv06 auditd:node = r01sv06 type = EXECVE msg = audit(1389047402.069:4455727):argc = 4 a0 = \“/ sbin / ip \”a1 = \“link \” A2 = \ “秀\” A3 = \ “LO \” “ ”@时间戳“: ”2014-01-06T22:30:14.643Z“, ”@版本“: ”1“, ”类型“: ”审核“, ”宿主“:” r01sv09a “ ”路径“: ”/数据/日志/ audit.log“, ”syslog_timestamp“:” 一月 6 15:30:11“,”syslog_program“:”auditd“,”received_at“:”2014-01-06 22:30:14 UTC“,”received_from“:”r01sv06“,”syslog_severity_code“:5, “syslog_facility_code”:1, “SYSLOG_FACILITY”: “用户级”, “syslog_severity”: “通知”, “@ source_host”: “r01sv06”}
{“message”:“1月6日15:30:11 r01sv06 auditd:node = r01sv06 type = CWD 味精=审计(1389047402.069:4455727): CWD = \ “/根\” “ ”@时间戳“: ”2014-01-06T22:30:14.644Z“, ”@版本“: ”1“, ”类型“: ”审核“, ”宿主“:” r01sv09a “ ”路径“: ”/数据/日志/ audit.log“, ”syslog_timestamp“:” 一月 6 15:30:11“,”syslog_program“:”auditd“,”received_at“:”2014-01-06 22:30:14 UTC“,”received_from“:”r01sv06“,”syslog_severity_code“:5, “syslog_facility_code”:1,“syslog_facility”:“用户级”, “syslog_severity”:“notice”,“@ source_host”:“r01sv06”}
{“message”:“1月6日15:30:11 r01sv06 auditd:node = r01sv06 type = PATH msg = audit(1389047402.069:4455727):item = 0 name = \“/ sbin / ip \” inode = 1703996 dev = 08:02 mode = 0100755 ouid = 0 ogid = 0 RDEV = 00:00" , “@时间戳”: “2014-01-06T22:30:14.645Z”, “@版本”: “1”, “类型”: “审核”, “宿主”: “r01sv09a”, “路径”: “/数据/日志/ audit.log”, “syslog_timestamp”:“一月 6 15:30:11“,”syslog_program“:”auditd“,”received_at“:”2014-01-06 22:30:14 UTC“,”received_from“:”r01sv06“,”syslog_severity_code“:5, “syslog_facility_code”:1, “SYSLOG_FACILITY”: “用户级”, “syslog_severity”:“notice”,“@ source_host”:“r01sv06”,}