我在hdfs上的文件中有以下5行数据。我想将它加载到表中。我有正则表达式,它会为每一行数据加载额外的一行空值。有谁知道为什么会这样?
19/Mar/2018 3:00:06 INFO activity Submitted to Splunk
19/Mar/2018 3:00:20 INFO activity response received statuscode=200 bytesreceived=11548264
19/Mar/2018 3:00:21 INFO activity done writing K:\Data\031818\activity_031818.csv lineswritten=296110
19/Mar/2018 3:00:21 INFO hardware Submitted to Splunk
我用它来创建表
create table Splunk_BCO_MSR
(
ts string,
status string,
area string,
text string
)
partitioned by (partition_dt date)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "([^ ]+[ ][^ ]*) ([^ ]*) ([^ ]*) (.*)?");
这几乎可以工作,但是当我从表中运行select *时,我得到8行而不是4行。看起来还有一行NULLS被添加。
| 19/Mar/2018 3:00:06 | INFO | activity | Submitted to Splunk | 2018-03-18 |
| NULL | NULL | NULL | NULL | 2018-03-18 |
| 19/Mar/2018 3:00:20 | INFO | activity | response received statuscode=200 bytesreceived=11548264 | 2018-03-18 |
| NULL | NULL | NULL | NULL | 2018-03-18 |
| 19/Mar/2018 3:00:21 | INFO | activity | done writing K:\Data\031818\activity_031818.csv lineswritten=296110 | 2018-03-18 |
| NULL | NULL | NULL | NULL | 2018-03-18 |
| 19/Mar/2018 3:00:21 | INFO | hardware | Submitted to Splunk | 2018-03-18 |
| NULL | NULL | NULL | NULL | 2018-03-18