Question

我正在尝试使用其在线手稿的Pig演示代码。

首先，我创建一个名为myfile.txt的测试文件。它包含两行中的六个整数：

4 5 3 
1 2 3

使用hadoop fs -copyFromLocal myfile.txt /user/myfile.txt

将文件放入hdfs

然后我跑

A = LOAD '/user/myfile.text';
DUMP A;

但是获取以下错误消息：

2014-10-08 14:15:54,259 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2014-10-08 14:15:54,594 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-10-08 14:15:54,692 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-10-08 14:15:54,693 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-10-08 14:15:54,909 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-10-08 14:15:54,998 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2014-10-08 14:15:55,006 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-10-08 14:15:55,013 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=12
2014-10-08 14:15:55,015 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2014-10-08 14:15:55,016 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job7804857093829884774.jar
2014-10-08 14:15:58,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job7804857093829884774.jar created
2014-10-08 14:15:58,266 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2014-10-08 14:15:58,304 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2014-10-08 14:15:58,353 [JobControl] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2014-10-08 14:15:58,806 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2014-10-08 14:15:58,964 [JobControl] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-08 14:15:58,968 [JobControl] WARN  org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2014-10-08 14:15:58,969 [JobControl] WARN  org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-10-08 14:15:59,024 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-10-08 14:15:59,025 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2014-10-08 14:15:59,051 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2014-10-08 14:16:00,533 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201410081312_0015
2014-10-08 14:16:00,534 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A
2014-10-08 14:16:00,534 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[2,4] C:  R: 
2014-10-08 14:16:05,098 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2014-10-08 14:16:05,098 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201410081312_0015 has failed! Stop running all dependent jobs
2014-10-08 14:16:05,099 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-10-08 14:16:05,109 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-10-08 14:16:05,111 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
2.0.0-cdh4.7.0  0.11.0-cdh4.7.0 hdfs    2014-10-08 14:15:54 2014-10-08 14:16:05 UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_201410081312_0015   A   MAP_ONLY    Message: Job failed!    

**Input(s):
Failed to read data from "/user/myfile.txt"**

似乎Pig没有连接到hdfs，因此无法访问该文件。有人可以帮我解决这个问题吗？

Answer 1

更改文件的设置。也许你无法阅读该文件。

在Linux环境中使用

更改文件的权限

chmod 755 myfile.txt

之后执行CopyFromLocal命令。

PigLatin无法从hdfs读取文件

1 个答案: