我正在尝试分析apache日志,目标是找出所有用户代理及其使用百分比。当结果包含每个useragent,count和percentage时,以下程序对该行正常工作。当尝试按照最常用的顺序排序时,程序在最后一行失败。有人可以帮忙吗?
logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent);
uarows = FOREACH logs GENERATE userAgent;
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
dump total;
gpuarows = GROUP uarows BY userAgent;
result = FOREACH gpuarows {
subtotal = COUNT(uarows);
GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, 100*(double)subtotal/(double)total.count AS percentage;
};
orderresult = ORDER result BY SUB_TOTAL DESC;
dump orderresult;
奇怪的是'转储结果'工作得很好,所以ORDER行就麻烦了
错误:
013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-04-13 11:33:09,995 [Thread-48] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
... 6 more
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases orderresult
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: orderresult[16,14] C: R:
2013-04-13 11:33:15,286 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-04-13 11:33:15,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0005 has failed! Stop running all dependent jobs
2013-04-13 11:33:15,287 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-04-13 11:33:15,287 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-04-13 11:33:15,288 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.4 0.11.0 dliu 2013-04-13 11:32:27 2013-04-13 11:33:15 GROUP_BY,ORDER_BY
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local_0002 1 1 n/a n/a n/a n/a n/a n/a 1-18,logs,total,uarows MULTI_QUERY,COMBINER
job_local_0003 1 1 n/a n/a n/a n/a n/a n/a gpuarows,result GROUP_BY,COMBINER
job_local_0004 1 1 n/a n/a n/a n/a n/a n/a orderresult SAMPLER
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0005 orderresult ORDER_BY Message: Job failed! Error - NA file:/tmp/temp265162785/tmp896004388,
Input(s):
Successfully read 0 records from: "file:///home/dliu/ApacheLogAnalysisWithPig/access.log"
Output(s):
Failed to produce result in "file:/tmp/temp265162785/tmp896004388"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local_0002 -> job_local_0003,
job_local_0003 -> job_local_0004,
job_local_0004 -> job_local_0005,
job_local_0005
2013-04-13 11:33:15,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2013-04-13 11:33:15,297 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias orderresult
Details at logfile: /home/dliu/ApacheLogAnalysisWithPig/pig_1365823931459.log
答案 0 :(得分:2)
确保两件事:
1)以当地模式运行猪:猪-x本地 2)将PIG_HOME或PIG_INSTALL环境变量设置为指向pig安装目录
答案 1 :(得分:1)
请检查您是否已经存档/ tmp / temp265162785 / tmp896004388 您可以将相同的文件\目录用于不同的任务。