如何解决以下apache pig错误?

时间:2015-11-13 14:47:20

标签: hadoop apache-pig

我正在执行以下命令:

A= load 'user/cloudera' using PigStorage(':');
foreach A generate $0,$4,$5;
dump B;

在执行最后一个命令时,我得到以下错误,我无法解决。对于bigdata和apache hadoop堆栈的新手,我无法理解这个错误。请尽快帮助。还要在StackOverflow上搜索类似的错误没有帮助:

2015-11-13 06:36:46,170 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2015-11-13 06:36:46,208 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]}
2015-11-13 06:36:46,212 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-11-13 06:36:46,225 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-11-13 06:36:46,225 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-11-13 06:36:46,404 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-11-13 06:36:46,415 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2015-11-13 06:36:46,445 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-11-13 06:36:49,232 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job306801006066349255.jar
2015-11-13 06:37:04,185 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job306801006066349255.jar created
2015-11-13 06:37:04,223 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-11-13 06:37:04,238 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-11-13 06:37:04,238 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cache
2015-11-13 06:37:04,238 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-11-13 06:37:04,274 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-11-13 06:37:04,274 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2015-11-13 06:37:04,283 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-11-13 06:37:04,363 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-11-13 06:37:05,416 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudera/.staging/job_1447417089361_0004
2015-11-13 06:37:05,420 [JobControl] WARN  org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
2015-11-13 06:37:05,420 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:DefaultJobName got an error while submitting 
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
    ... 18 more
2015-11-13 06:37:05,423 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1447417089361_0004
2015-11-13 06:37:05,423 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2015-11-13 06:37:05,423 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,3],B[4,3] C:  R: 
2015-11-13 06:37:05,423 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_1447417089361_0004
2015-11-13 06:37:05,440 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-11-13 06:37:10,463 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-11-13 06:37:10,463 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1447417089361_0004 has failed! Stop running all dependent jobs
2015-11-13 06:37:10,463 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-11-13 06:37:10,620 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1447417089361_0004. Redirecting to job history server.
2015-11-13 06:37:10,844 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Could not get Job info from RM for job job_1447417089361_0004. Redirecting to job history server.
2015-11-13 06:37:10,849 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-11-13 06:37:10,850 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
2.6.0-cdh5.4.2  0.12.0-cdh5.4.2 cloudera    2015-11-13 06:36:46 2015-11-13 06:37:10 UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_1447417089361_0004  A,B MAP_ONLY    Message: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
    ... 18 more
    hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528,

Input(s):
Failed to read data from "hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera"

Output(s):
Failed to produce result in "hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1447417089361_0004


2015-11-13 06:37:10,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2015-11-13 06:37:10,853 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B
Details at logfile: /home/cloudera/pig_1447424730804.log

5 个答案:

答案 0 :(得分:4)

<强>解决方案:

将代码的第1行更改为:

A= load '/user/cloudera' using PigStorage(':');

话虽如此,您确定您的输入数据是在您的家庭区域吗?这似乎不太可能。它更有可能位于您所在地区的文件夹中,即/user/cloudera/input-data

在开始工作之前,请执行以下操作:

hdfs dfs -ls /user/cloudera

确认输入数据实际位于该文件夹中。如果不是,请确定其实际位置,并确保它在HDFS上,而不是在本地。

<强>解释

日志的相关部分是

ERROR 2118: Input path does not exist: hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera

这表明它与输入路径有关。处理输入路径的代码部分是

A= load 'user/cloudera' using PigStorage(':');

通过不向/user添加正斜杠,它会假定所有内容都与您的家庭区域相关,因此例如写load 'input'会导致Pig作业在hdfs://quickstart.cloudera:8028/user/cloudera/input中读取。在你的情况下,丢失的斜杠意味着它将它添加到你的用户区。

答案 1 :(得分:1)

请检查在本地/ linux模式或mapreduce模式下运行pig语句的位置,如果您运行本地模式,则无法读取HDFS文件

运行此cmd $ pig -x mapreduce

之后 咕噜&GT; A =加载'/; Grunt&gt;转储/存储/

答案 2 :(得分:0)

输入(S):

Failed to read data from 

"hdfs://quickstart.cloudera:8020/user/cloudera/user/cloudera"

输出(一个或多个):

Failed to produce result in "hdfs://quickstart.cloudera:8020/tmp/temp-193566860/tmp-1023933528"

计数器:

Total records written : 0

Total bytes written : 0

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

如果您查看上面的错误代码(从发布的问题中复制),输入文件路径由pig获取,如下所示

<强> 'HDFS://quickstart.cloudera:8020 /用户/ Cloudera的/用户/ Cloudera的'

因此,当在mapreduce模式下运行pig并使用快速启动VM时,您可以使用

输入文件路径之前的

'hdfs://quickstart.cloudera:8020'....

输入是文件夹吗?...如果是这样猪脚本将读取文件夹中的所有文件....

示例:

说你的输入文件路径位于hdfs '/ usr / cloudera / inputdata.txt',那么你的查询必须是

A =使用PigStorage(':')加载'hdfs://quickstart.cloudera:8020 / user / cloudera / inputdata.txt';

foreach A生成$ 0,$ 4,$ 5;

转储B;

答案 3 :(得分:-1)

您唯一需要检查的是文件路径。

  1. hdfs dfs -ls / 检查路径结构是如何来的。 像.... / input / data / file。 文件你打算用。
  2. A = load&#39; / input / data / file&#39;;
  3. dump A;
  4. 唯一的问题是您要加载的文件的正确路径。

答案 4 :(得分:-1)

我遇到了类似的问题,我可以通过从目标文件夹中删除passwd文件然后使用正确的相对路径再次复制来解决此问题:

A= load '/user/cloudera' using PigStorage(':');

视频中的说明缺少&#39; /&#39;在用户之前

  • 第1步:

    hdfs dfs -rm /user/cloudera/passwd
    
  • 第2步:

    A= load '/user/cloudera' using PigStorage(':');
    

然后按照Coursera视频中的说明进行操作。