以下是来自HIVE表的选择查询:
select * from test_aviation limit 5;
OK
2015 1 1 1 4 2015-01-01 AA 19805 AA N787AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0855 -5.00 0.00 0.00 -1 0900-0959 17.00 0912 1230 7.00 1230 1237 7.00 7.00 0.00 0 1200-1259 0.00 0.00 390.00 402.00 378.00 1.00 2475.00 10
2015 1 1 2 5 2015-01-02 AA 19805 AA N795AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0850 -10.00 0.00 0.00 -1 0900-0959 15.00 0905 1202 9.00 1230 1211 -19.00 0.00 0.00 -2 1200-1259 0.00 0.00 390.00 381.00 357.00 1.00 2475.00 10
2015 1 1 3 6 2015-01-03 AA 19805 AA N788AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 15.00 0908 1138 13.00 1230 1151 -39.00 0.00 0.00 -2 1200-1259 0.00 0.00 390.00 358.00 330.00 1.00 2475.00 10
2015 1 1 4 7 2015-01-04 AA 19805 AA N791AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 14.00 0907 1159 19.00 1230 1218 -12.00 0.00 0.00 -1 1200-1259 0.00 0.00 390.00 385.00 352.00 1.00 2475.00 10
2015 1 1 5 1 2015-01-05 AA 19805 AA N783AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0853 -7.00 0.00 0.00 -1 0900-0959 27.00 0920 1158 24.00 1230 1222 -8.00 0.00 0.00 -1 1200-1259 0.00 0.00 390.00 389.00 338.00 1.00 2475.00 10
Time taken: 0.067 seconds, Fetched: 5 row(s)
HIVE表的结构
hive> describe test_aviation;
OK
col_value string
Time taken: 0.221 seconds, Fetched: 1 row(s)
我想在不同的列中分隔整个表。我写了一个如下的查询来提取第12列:
SELECT regexp_extract(col_value, '^(?:([^,]*)\,?){1}', 12) from test_aviation;
输出:
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1437067221195_0008, Tracking URL = http://localhost:8088/proxy/application_1437067221195_0008/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1437067221195_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-07-17 02:46:56,215 Stage-1 map = 0%, reduce = 0%
2015-07-17 02:47:27,650 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1437067221195_0008 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://localhost:8088/proxy/application_1437067221195_0008/
Examining task ID: task_1437067221195_0008_m_000000 (and more) from job job_1437067221195_0008
Task with the most failures(4):
-----
Task ID:
task_1437067221195_0008_m_000000
URL:
http://localhost:8088/taskdetails.jsp?jobid=job_1437067221195_0008&tipid=task_1437067221195_0008_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col_value":"2015\t1\t1\t1\t4\t2015-01-01\tAA\t19805\tAA\tN787AA\t1\tJFK\tNew York\t NY\tNY\t36\tNew York\t22\tLAX\tLos Angeles\t CA\tCA\t06\tCalifornia\t91\t0900\t0855\t-5.00\t0.00\t0.00\t-1\t0900-0959\t17.00\t0912\t1230\t7.00\t1230\t1237\t7.00\t7.00\t0.00\t0\t1200-1259\t0.00\t\t0.00\t390.00\t402.00\t378.00\t1.00\t2475.00\t10\t\t\t"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col_value":"2015\t1\t1\t1\t4\t2015-01-01\tAA\t19805\tAA\tN787AA\t1\tJFK\tNew York\t NY\tNY\t36\tNew York\t22\tLAX\tLos Angeles\t CA\tCA\t06\tCalifornia\t91\t0900\t0855\t-5.00\t0.00\t0.00\t-1\t0900-0959\t17.00\t0912\t1230\t7.00\t1230\t1237\t7.00\t7.00\t0.00\t0\t1200-1259\t0.00\t\t0.00\t390.00\t402.00\t378.00\t1.00\t2475.00\t10\t\t\t"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(java.lang.String,java.lang.String,java.lang.Integer) on object org.apache.hadoop.hive.ql.udf.UDFRegExpExtract@4def4616 of class org.apache.hadoop.hive.ql.udf.UDFRegExpExtract with arguments {2015 1 1 1 4 2015-01-01 AA 19805 AA N787AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 06 California 91 0900 0855 -5.00 0.00 0.00 -1 0900-0959 17.00 0912 1230 7.00 1230 1237 7.00 7.00 0.00 0 1200-1259 0.00 0.00 390.00 402.00 378.00 1.00 2475.00 10 :java.lang.String, ^(?:([^,]*),?){1}:java.lang.String, 12:java.lang.Integer} of size 3
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1243)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1219)
... 18 more
Caused by: java.lang.IndexOutOfBoundsException: No group 12
at java.util.regex.Matcher.group(Matcher.java:487)
at org.apache.hadoop.hive.ql.udf.UDFRegExpExtract.evaluate(UDFRegExpExtract.java:56)
... 23 more
请帮助我从HIVE表中提取不同的列。
答案 0 :(得分:0)
试试这个:
select split(col_value,' ')[11] as column_12 from test_aviation;
假设你有空格分隔符。
'\\t' if tab
'\\|' for pipe...
':'
等等