PIG无法搜索输入文件

时间:2013-09-21 05:45:11

标签: file hadoop apache-pig

我正在尝试学习PIG和我的第一个脚本(在Apache Hadoop中),我正在尝试读取包含如下所示数据的文件。我真的没有得到任何关于错误的线索。任何人都可以帮我解决这个问题吗?

M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20
F,0.545,0.425,0.125,0.768,0.294,0.1495,0.26,16
M,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
F,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19

文件名是abalone.txt。 我将输入文件加载到HDFS输入文件夹。我通过这个命令检查了一下:

notroot@ubuntu:~$ hadoop fs -ls /input
Warning: $HADOOP_HOME is deprecated.

Found 2 items
-rw-r--r--   1 notroot supergroup     191873 2013-09-12 06:21 /input/abalone.txt
-rw-r--r--   1 notroot supergroup   81468050 2013-07-07 05:12 /input/weblogs

之后,当我尝试读取文件时,我正在使用以下命令:

notroot@ubuntu:~$ pig
Warning: $HADOOP_HOME is deprecated.

2013-09-17 06:18:06,361 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2013-09-17 06:18:06,361 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/notroot/pig_1379398686357.log
2013-09-17 06:18:06,608 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020
2013-09-17 06:18:07,168 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:8021

grunt> abalone = LOAD 'input/abalone.txt' using PigStorage(',') AS (sex:chararray,length:double,diameter:double,height:double,w_weight:double,s_weight:double,v_weight:double,shell_weight:double,rings:int);
grunt> lmt = LIMIT abalone 20;
grunt> DUMP lmt;
.
.
.
Input(s):
Failed to read data from "hdfs://localhost:8020/user/notroot/input/abalone.txt"

Output(s):
Failed to produce result in "hdfs://localhost:8020/tmp/temp-282841677/tmp530587011"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
null


2013-09-17 06:27:30,818 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-09-17 06:27:30,823 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias lmt
Details at logfile: /home/notroot/pig_1379399216117.log

但是我在pig日志文件中收到以下错误:

org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input path does not exist: file:/home/notroot/abalone.txt
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:285)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
    at java.lang.Thread.run(Thread.java:679)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/notroot/abalone.txt
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:273)
    ... 15 more

Pig Stack Trace

ERROR 1066: Unable to open iterator for alias abalone

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias abalone
    at org.apache.pig.PigServer.openIterator(PigServer.java:857)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:490)
    at org.apache.pig.Main.main(Main.java:111)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
    at org.apache.pig.PigServer.openIterator(PigServer.java:849)
    ... 12 more

ERROR 1000: Error during parsing. Encountered " <IDENTIFIER> "hadoop "" at line 1, column 1.
Was expecting one of:
    <EOF> 
    "cat" ...
    "fs" ...
    "sh" ...
    "cd" ...
    "cp" ...
    "copyFromLocal" ...
    "copyToLocal" ...
    "dump" ...
    "describe" ...
    "aliases" ...
    "explain" ...
    "help" ...
    "kill" ...
    "ls" ...
    "mv" ...
    "mkdir" ...
    "pwd" ...
    "quit" ...
    "register" ...
    "rm" ...
    "rmf" ...
    "set" ...
    "illustrate" ...
    "run" ...
    "exec" ...
    "scriptDone" ...
    "" ...
    <EOL> ...
    ";" ...


org.apache.pig.tools.pigscript.parser.ParseException: Encountered " <IDENTIFIER> "hadoop "" at line 1, column 1.
Was expecting one of:
    <EOF> 
    "cat" ...
    "fs" ...
    "sh" ...
    "cd" ...
    "cp" ...
    "copyFromLocal" ...
    "copyToLocal" ...
    "dump" ...
    "describe" ...
    "aliases" ...
    "explain" ...
    "help" ...
    "kill" ...
    "ls" ...
    "mv" ...
    "mkdir" ...
    "pwd" ...
    "quit" ...
    "register" ...
    "rm" ...
    "rmf" ...
    "set" ...
    "illustrate" ...
    "run" ...
    "exec" ...
    "scriptDone" ...
    "" ...
    <EOL> ...
    ";" ...

    at org.apache.pig.tools.pigscript.parser.PigScriptParser.generateParseException(PigScriptParser.java:1118)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.handle_invalid_command(PigScriptParser.java:934)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:527)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:490)
    at org.apache.pig.Main.main(Main.java:111)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

ERROR 1000: Error during parsing. Encountered " <IDENTIFIER> "botroot "" at line 1, column 1.
Was expecting one of:
    <EOF> 
    "cat" ...
    "fs" ...
    "sh" ...
    "cd" ...
    "cp" ...
    "copyFromLocal" ...
    "copyToLocal" ...
    "dump" ...
    "describe" ...
    "aliases" ...
    "explain" ...
    "help" ...
    "kill" ...
    "ls" ...
    "mv" ...
    "mkdir" ...
    "pwd" ...
    "quit" ...
    "register" ...
    "rm" ...
    "rmf" ...
    "set" ...
    "illustrate" ...
    "run" ...
    "exec" ...
    "scriptDone" ...
    "" ...
    <EOL> ...
    ";" ...

org.apache.pig.tools.pigscript.parser.ParseException: Encountered " <IDENTIFIER> "botroot "" at line 1, column 1.
Was expecting one of:
    <EOF> 
    "cat" ...
    "fs" ...
    "sh" ...
    "cd" ...
    "cp" ...
    "copyFromLocal" ...
    "copyToLocal" ...
    "dump" ...
    "describe" ...
    "aliases" ...
    "explain" ...
    "help" ...
    "kill" ...
    "ls" ...
    "mv" ...
    "mkdir" ...
    "pwd" ...
    "quit" ...
    "register" ...
    "rm" ...
    "rmf" ...
    "set" ...
    "illustrate" ...
    "run" ...
    "exec" ...
    "scriptDone" ...
    "" ...
    <EOL> ...
    ";" ...

    at org.apache.pig.tools.pigscript.parser.PigScriptParser.generateParseException(PigScriptParser.java:1118)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.handle_invalid_command(PigScriptParser.java:934)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:527)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
    at org.apache.pig.Main.run(Main.java:490)
    at org.apache.pig.Main.main(Main.java:111)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

3 个答案:

答案 0 :(得分:0)

正如评论中所提到的,通过添加前导斜杠可以轻松解决问题:

load '/input/abalone.txt'

答案 1 :(得分:0)

是使用/输入|| inputfile的路径

abalone = LOAD&#39; /input/abalone.txt'使用PigStorage(&#39;,&#39;)AS(性别:chararray,长度:double,直径:double,height:double,w_weight:double,s_weight:double,v_weight:double,shell_weight:double,rings:int );

答案 2 :(得分:0)

在分配给变量时,请确保在 '=' 之前或之后有一个空格。 例如: abalone= LOAD '/input/abalone.txt' using PigStorage(',') AS (sex:chararray,length:double,diameter:double,height:double,w_weight:double,s_weight:double,v_weight:double,shell_weight:双,环:int); 要么 abalone =LOAD '/input/abalone.txt' using PigStorage(',') AS (sex:chararray,length:double,diameter:double,height:double,w_weight:double,s_weight:double,v_weight:double,shell_weight: double,rings:int);