我在这里有一张蜂巢表。
CREATE external TABLE apacheLogs4(
ip STRING,
instance STRING,
time STRING,
request STRING,
status STRING,
size STRING,
referer STRING,
agent STRING,
last STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "\"([^ ]*)\" ([^ ]*) - - \\[(.*)\\] ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-| [0-9]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\") ([^ ]*)",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE
LOCATION '/home/~user/Documents/apache_logs2'
我将正则表达式应用于ip和agent,以便在将这些记录插入新表之前提取每条记录的国家/地区和浏览器类型。
我如何在蜂巢中做到这一点?
答案 0 :(得分:0)