python hadoop:mapreduce工作不起作用

时间:2016-05-30 13:18:03

标签: python hadoop mapreduce hdfs sequencefile

我的地图缩小程序正在处理20个视频,所以我已经在hdfs中上传了20个视频,当我开始执行地图缩减代码时,终端上没有继续进行。当我运行此命令pydoop submit --upload-file-to-cache stage1.py stage1 path_directory stage1_output时,它停止了。登录终端如下。

hduser@Barca-FC:/home/uday/Project/final project/algo2$ pydoop submit --upload-file-to-cache twodct.py twodct  path_directory twodct_output
16/05/30 18:19:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/30 18:19:21 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/05/30 18:19:22 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
16/05/30 18:19:22 INFO input.FileInputFormat: Total input paths to process : 1
16/05/30 18:19:22 INFO mapreduce.JobSubmitter: number of splits:1
16/05/30 18:19:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1464609268645_0002
16/05/30 18:19:23 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/05/30 18:19:23 INFO impl.YarnClientImpl: Submitted application application_1464609268645_0002
16/05/30 18:19:23 INFO mapreduce.Job: The url to track the job: http://Barca-FC:8088/proxy/application_1464609268645_0002/
16/05/30 18:19:23 INFO mapreduce.Job: Running job: job_1464609268645_0002

我的hadoop配置文件是这样的:

mapred-site.xml:
<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
 </property>
<property>
<name>mapred.reduce.tasks</name>
<value>1</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

HDFS-site.xml中:

<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>
<property>
 <name>dfs.webhdfs.enabled</name>
 <value>true</value>
</property>
</configuration>

核心-site.xml中:

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
 </property>

 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
 </property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

任何人都可以告诉我为什么我的mapreduce工作没有进行? 在此先感谢!

0 个答案:

没有答案