我有一个列“类别”,其中包含类似这样的数据
"Failed extract of third-party root list from auto update cab at: <http://ctldl.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: The data is invalid."
我需要在类别列的" < > "
符号之间选择网址部分。
我写了一个配置单元查询 -
select level,category,regexp_extract(category,'http://[^\>]*') AS url from event where level='Error';
我有一个例外:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201406122248_0014, Tracking URL = http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201406122248_0014
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_201406122248_0014
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-06-13 02:13:35,696 Stage-1 map = 0%, reduce = 0%
2014-06-13 02:14:13,895 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201406122248_0014 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://0.0.0.0:50030/jobdetails.jsp?jobid=job_201406122248_0014
Examining task ID: task_201406122248_0014_m_000002 (and more) from job job_201406122248_0014
Task with the most failures(4):
-----
Task ID:
task_201406122248_0014_m_000000
URL:
http://localhost.localdomain:50030/taskdetails.jsp?jobid=job_201406122248_0014&tipid=task_201406122248_0014_m_000000
-----
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"level":"Error","datetimes":"6/13/2014 9:24:05 AM","source":"Microsoft-Windows-CAPI2","eventid":4107,"task":"None","category":"\"Failed extract of third-party root list from auto update cab at: <http://ctldl.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: The data is invalid."}
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
如何解决这个问题? 请帮忙。