此查询:
CREATE external table if NOT EXISTS s3_logs ( owner STRING, bucket STRING, time STRING, remote_ip STRING, requester STRING, request_id STRING, operation STRING, key STRING, request_uri STRING, http_status STRING, error_code STRING, bytes_sent INT, object_size INT, total_time INT, turn_around_time INT, referrer STRING, user_agent STRING, version_id STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]+) ([^ ]+) (\[[^\\]*\]) ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) (-|[^ ]) (\".*?\") ([0-9]{3}) (-|[0-9]+) ([0-9]+) (-|[^ ]) ([0-9]+) (-|[^ ]) ([^ ]+) ([^ ]+) (-|[^ ])" )
LOCATION 's3://blah-logs'
运行时产生此错误:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
java.util.regex.PatternSyntaxException:
Unclosed character class near index 156
([^ ]+) ([^ ]+) ([[^\]*]) ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) (-|[^ ]) (".*?") ([0-9]{3}) (-|[0-9]+) ([0-9]+) (-|[^ ]) ([0-9]+) (-|[^ ]) ([^ ]+) ([^ ]+) (-|[^ ]) ^`
这个正则表达式匹配我在https://regex101.com和另一个java正则表达式检查器上的输入。