我正在尝试学习PIG和我的第一个脚本(在Apache Hadoop中),我正在尝试读取包含如下所示数据的文件。我真的没有得到任何关于错误的线索。任何人都可以帮我解决这个问题吗?
M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20
F,0.545,0.425,0.125,0.768,0.294,0.1495,0.26,16
M,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
F,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19
文件名是abalone.txt。 我将输入文件加载到HDFS输入文件夹。我通过这个命令检查了一下:
notroot@ubuntu:~$ hadoop fs -ls /input
Warning: $HADOOP_HOME is deprecated.
Found 2 items
-rw-r--r-- 1 notroot supergroup 191873 2013-09-12 06:21 /input/abalone.txt
-rw-r--r-- 1 notroot supergroup 81468050 2013-07-07 05:12 /input/weblogs
之后,当我尝试读取文件时,我正在使用以下命令:
notroot@ubuntu:~$ pig
Warning: $HADOOP_HOME is deprecated.
2013-09-17 06:18:06,361 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2013-09-17 06:18:06,361 [main] INFO org.apache.pig.Main - Logging error messages to: /home/notroot/pig_1379398686357.log
2013-09-17 06:18:06,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2013-09-17 06:18:07,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021
grunt> abalone = LOAD 'input/abalone.txt' using PigStorage(',') AS (sex:chararray,length:double,diameter:double,height:double,w_weight:double,s_weight:double,v_weight:double,shell_weight:double,rings:int);
grunt> lmt = LIMIT abalone 20;
grunt> DUMP lmt;
.
.
.
Input(s):
Failed to read data from "hdfs://localhost:8020/user/notroot/input/abalone.txt"
Output(s):
Failed to produce result in "hdfs://localhost:8020/tmp/temp-282841677/tmp530587011"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
null
2013-09-17 06:27:30,818 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-09-17 06:27:30,823 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias lmt
Details at logfile: /home/notroot/pig_1379399216117.log
但是我在pig日志文件中收到以下错误:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/home/notroot/abalone.txt
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:679)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/notroot/abalone.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
... 15 more
Pig Stack Trace
ERROR 1066: Unable to open iterator for alias abalone
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias abalone
at org.apache.pig.PigServer.openIterator(PigServer.java:857)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:849)
... 12 more
ERROR 1000: Error during parsing. Encountered " <IDENTIFIER> "hadoop "" at line 1, column 1.
Was expecting one of:
<EOF>
"cat" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"describe" ...
"aliases" ...
"explain" ...
"help" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
<EOL> ...
";" ...
org.apache.pig.tools.pigscript.parser.ParseException: Encountered " <IDENTIFIER> "hadoop "" at line 1, column 1.
Was expecting one of:
<EOF>
"cat" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"describe" ...
"aliases" ...
"explain" ...
"help" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
<EOL> ...
";" ...
at org.apache.pig.tools.pigscript.parser.PigScriptParser.generateParseException(PigScriptParser.java:1118)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.handle_invalid_command(PigScriptParser.java:934)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:527)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
ERROR 1000: Error during parsing. Encountered " <IDENTIFIER> "botroot "" at line 1, column 1.
Was expecting one of:
<EOF>
"cat" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"describe" ...
"aliases" ...
"explain" ...
"help" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
<EOL> ...
";" ...
org.apache.pig.tools.pigscript.parser.ParseException: Encountered " <IDENTIFIER> "botroot "" at line 1, column 1.
Was expecting one of:
<EOF>
"cat" ...
"fs" ...
"sh" ...
"cd" ...
"cp" ...
"copyFromLocal" ...
"copyToLocal" ...
"dump" ...
"describe" ...
"aliases" ...
"explain" ...
"help" ...
"kill" ...
"ls" ...
"mv" ...
"mkdir" ...
"pwd" ...
"quit" ...
"register" ...
"rm" ...
"rmf" ...
"set" ...
"illustrate" ...
"run" ...
"exec" ...
"scriptDone" ...
"" ...
<EOL> ...
";" ...
at org.apache.pig.tools.pigscript.parser.PigScriptParser.generateParseException(PigScriptParser.java:1118)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.handle_invalid_command(PigScriptParser.java:934)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:527)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
答案 0 :(得分:0)
正如评论中所提到的,通过添加前导斜杠可以轻松解决问题:
load '/input/abalone.txt'
答案 1 :(得分:0)
是使用/输入|| inputfile的路径
abalone = LOAD&#39; /input/abalone.txt'使用PigStorage(&#39;,&#39;)AS(性别:chararray,长度:double,直径:double,height:double,w_weight:double,s_weight:double,v_weight:double,shell_weight:double,rings:int );
答案 2 :(得分:0)
在分配给变量时,请确保在 '=' 之前或之后有一个空格。 例如: abalone= LOAD '/input/abalone.txt' using PigStorage(',') AS (sex:chararray,length:double,diameter:double,height:double,w_weight:double,s_weight:double,v_weight:double,shell_weight:双,环:int); 要么 abalone =LOAD '/input/abalone.txt' using PigStorage(',') AS (sex:chararray,length:double,diameter:double,height:double,w_weight:double,s_weight:double,v_weight:double,shell_weight: double,rings:int);