我正在尝试加载apache日志,拆分为字段并将其保存到hcatalog中。
apache_log = LOAD 'httpd-www01-access.log.2014-02-09-*' USING TextLoader AS (line:chararray);
apache_row = FOREACH apache_log GENERATE FLATTEN (
REGEX_EXTRACT_ALL
(line,'^"(\\S+)" \\[(\\d{2}\\/\\w+\\/\\d{4}:\\d{2}:\\d{2}:\\d{2} \\+\\d{4}]) (\\S+) (\\S+) "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)" "([^"]*)"'))
AS (ip: chararray, datetime: chararray, session_id: chararray, time_of_request:chararray, request: chararray, status: chararray, size: chararray, referer : chararray, cookie: chararray, user_agent: chararray);
如果我这样做:
a = sample apache_row 0.001;
dump a
它有效。
但
store apache_row into 'stage.apache_log' using org.apache.hcatalog.pig.HCatStorer();
不
错误:
2014-02-17 08:17:13,812 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2014-02-17 08:17:13,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201402120751_0117 has failed! Stop running all dependent jobs
2014-02-17 08:17:13,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-02-17 08:17:13,814 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-02-17 08:17:13,815 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.2.0.1.3.2.0-111 0.11.1.1.3.2.0-111 pig 2014-02-17 08:16:24 2014-02-17 08:17:13 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201402120751_0117 apache_log,apache_row MAP_ONLY Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201402120751_0117_m_000000 stage.atg_apache_log,
Input(s):
Failed to read data from "hdfs://hadoop1:8020/user/pig/httpd-www01-access.log.2014-02-09-*"
Output(s):
Failed to produce result in "stage.apache_log"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201402120751_0117
我在哪里可以找到问题的任何细节?
有一个信息,我可以找到更多细节:
hadoop1:50030 / jobdetails.jsp作业ID = job_201402120751_0117
但是当工作完成时它不起作用......
问候
的Pawel