我想解析此日志示例
May 3 11:52:54 cdh-dn03 init:tty(/ dev / tty6)主进程(1208) 被TERM信号杀死
5月3日11:53:31 cdh-dn03内核:已注册的taskstats版本1
5月3日11:53:31 cdh-dn03内核:sr0:scsi3-mmc驱动器:32x / 32x xa / form2托盘
5月3日11:53:31 cdh-dn03内核:piix4_smbus 0000:00:07.0:SMBus基础 未初始化的地址-升级BIOS或使用force_addr = 0xaddr
5月3日11:53:31 cdh-dn03内核:nf_conntrack版本0.5.0(7972 桶,最大31888)
5月3日11:53:57 cdh-dn03内核:hrtimer:中断花费了11250457 ns
5月3日11:53:59 cdh-dn03 ntpd_initres [1705]:找不到主机名: 0.rhel.pool.ntp.org
这是我创建表格并将数据加载到其中的方式
CREATE TABLE LogParserSample(
month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex' = '(^(\S+))\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((\S+.)*)')
stored as textfile;
我正在使用这些网站生成正则表达式
这两个是我正在使用的正则表达式
(\ w {3})\ s +(\ w {1})\ s +(\ S +)\ s +(\ S +)\ s +(\ S +)\ s +((\ S +。)*)
(^(\ S +))\ s +(\ S +)\ s +(\ S +)\ s +(\ S +)\ s +(\ S +)\ s +((\ S +。)*)
加载数据并选择
load data local inpath '/home/programmeur_v/serde_dataset.txt' into table LogParserSample;
select * from LogParserSample;
输出为空
hive>从LogParserSample中选择*;
确定
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL
耗时:0.094秒,已访存:7行
只是蜂巢的新成员,所以不知道到底是什么问题
答案 0 :(得分:3)
在使用正则表达式Serde创建
尝试使用以下ddl:
hive> CREATE TABLE LogParserSample(
month_name STRING, day STRING, time STRING, host STRING, event STRING, log STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex' = '(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(.*)')
stored as textfile;
hive> select * from LogParserSample;
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
| month_name | day | time | host | event | log |
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
| May | 3 | 11:52:54 | cdh-dn03 | init: | tty (/dev/tty6) main process (1208) killed by TERM signal |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | registered taskstats version 1 |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | sr0: scsi3-mmc drive: 32x/32x xa/form2 tray |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | piix4_smbus 0000:00:07.0: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr |
| May | 3 | 11:53:31 | cdh-dn03 | kernel: | nf_conntrack version 0.5.0 (7972 buckets, 31888 max) |
| May | 3 | 11:53:57 | cdh-dn03 | kernel: | hrtimer: interrupt took 11250457 ns |
| May | 3 | 11:53:59 | cdh-dn03 | ntpd_initres[1705]: | host name not found: 0.rhel.pool.ntp.org |
+-------------+------+-----------+-----------+----------------------+-----------------------------------------------------------------------------------------------------+--+
使用this链接生成与Java等效的正则表达式。