如何在Ubuntu 10.4上配置和运行Hadoop管道

时间:2012-05-04 00:11:55

标签: hadoop

我是Linux Hadoop的新手。我正在寻找指导,使Hadoop能够运行并编写C ++任务。我尝试使用教程

以伪分布式模式安装Hadoop

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

它适用于Java,但我在运行c ++ wordcount示例时遇到此错误:

12/05/03 18:23:00 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost/user/c1048267/books
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
    at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)
    at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
    at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)

如果有特定的软件,硬件或配置要求,请指导我。目前我使用的是Ubuntu 10.4 64位,Hadoop-0.20.2和Java_Sun_6。该平台是否支持Hadoop管道?如果没有,请指导我。

1 个答案:

答案 0 :(得分:0)

首先,您需要配置HADOOP_CLASSPATH以包含与Pipes相关的所有库,然后编译它们,为它创建一个MAKEFILE:

CC = g++
HADOOP_INSTALL = /home/hadoop/hadoop
PLATFORM = Linux-i386-32
CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include

wordcount: wordcount.cpp
$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \
-lhadooputils -lpthread -g -O2 -o $@

您需要在群集的每台计算机 libssl g ++ 中使用此功能。要编译并运行wordcount示例,您可以运行:

make  wordcount

然后,您必须在 HDFS 中复制/ bin目录的输出二进制文件:

hadoop dfs -mkdir bin                    
hadoop dfs -put  wordcount   bin/wordcount

使用以下命令运行程序:

hadoop pipes -D hadoop.pipes.java.recordreader=true  \ 
               -D hadoop.pipes.java.recordwriter=true \
               -input dft1  -output dft1-out  \
               -program bin/wordcount

我看到的第二件事是:

See JobConf(Class) or JobConf#setJar(String). Exception in thread "main"    org.apache.hadoop.mapred.InvalidInputException: **Input path does not exist: hdfs://localhost/user/c1048267/books** at 

您确定在HDFS中有该目录吗? 祝福

有关完整的指南,请参阅此link