我已经编写了自己的hadoop程序,并且我可以在自己的笔记本电脑中使用伪分发模式运行,但是,当我将程序放在可以运行示例jar的hadoop的集群中时,它默认启动本地作业但是我指示hdfs文件路径,下面是输出,给出建议?
./hadoop -jar MyRandomForest_oob_distance.jar hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
at Data.Data.loadData(Data.java:103)
at MapReduce.DearMapper.loadData(DearMapper.java:261)
at MapReduce.DearMapper.setup(DearMapper.java:332)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient: map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs
答案 0 :(得分:10)
已选择LocalJobRunner作为您的配置,很可能将mapred.job.tracker
属性设置为local
或者根本没有设置(在这种情况下默认为本地)。要检查,请转到“解压/安装hadoop的位置”/ etc / hadoop /并查看文件mapred-site.xml是否存在(对我而言,它没有,有一个名为mapped-site.xml.template的文件)。在该文件中(或者如果它不存在则创建它)确保它具有以下属性:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
org.apache.hadoop.mapred.JobClient.init(JobConf)
您提交此文件的机器上的hadoop配置中此配置属性的值是多少?还要确认您运行的hadoop可执行文件引用了此配置(并且您没有配置不同的2+安装) - 键入which hadoop
并跟踪您遇到的所有符号链接。
或者,如果您使用-jt选项知道JobTracker主机和端口号,则可以在提交作业时覆盖此项:
hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
答案 1 :(得分:4)
如果您正在使用Hadoop 2并且您的作业是在本地而不是在群集上运行,请确保您设置mapred-site.xml
以包含mapreduce.framework.name
属性,其值为yarn
。您还需要在yarn-site.xml
答案 2 :(得分:2)
我遇到了同样的问题,即每个mapreduce v2(mrv2)或yarn任务只运行了mapred.LocalJobRunner
INFO mapred.LocalJobRunner: Starting task: attempt_local284299729_0001_m_000000_0
可以访问Resourcemanager和Nodemanagers,并将mapreduce.framework.name设置为yarn。
在执行作业之前设置HADOOP_MAPRED_HOME为我解决了问题。
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
欢呼声 担