我想运行我在Mahout In Action中找到的代码:
package org.help;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;
public class SeqPrep {
public static void main(String args[]) throws IOException{
List<NamedVector> apples = new ArrayList<NamedVector>();
NamedVector apple;
apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "small round green apple");
apples.add(apple);
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path path = new Path("appledata/apples");
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, Text.class, VectorWritable.class);
VectorWritable vec = new VectorWritable();
for(NamedVector vector : apples){
vec.set(vector);
writer.append(new Text(vector.getName()), vec);
}
writer.close();
SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("appledata/apples"), conf);
Text key = new Text();
VectorWritable value = new VectorWritable();
while(reader.next(key, value)){
System.out.println(key.toString() + " , " + value.get().asFormatString());
}
reader.close();
}
}
我用以下代码编译它:
$ javac -classpath :/usr/local/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac/ SeqPrep.java
我把它搞定:
$ jar -cvf SeqPrep.jar -C myjavac/ .
现在我想在我的本地hadoop节点上运行它。我试过了:
hadoop jar SeqPrep.jar org.help.SeqPrep
但我明白了:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
所以我尝试使用libjars参数:
$ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT-sources.jar
并遇到了同样的问题。我不知道还有什么可以尝试。
我最终的目标是能够将hadoop fs上的.csv文件读入稀疏矩阵,然后将其乘以随机向量。
编辑:看起来Razvan得到了它(注意:请参阅下面的另一种方法,这样做不会弄乱你的hadoop安装)。供参考:
$ find /usr/local/hadoop-1.0.3/. |grep mah
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT.jar
然后:
$hadoop jar SeqPrep.jar org.help.SeqPrep
small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
编辑:我正在尝试这样做而不将mahout jar复制到hadoop lib /
$ rm /usr/local/hadoop-1.0.3/lib/mahout-*
然后当然:
hadoop jar SeqPrep.jar org.help.SeqPrep
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
当我尝试mahout作业文件时:
$hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep
Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
如果我尝试包含我制作的.jar文件:
$ hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep
Exception in thread "main" java.lang.ClassNotFoundException: SeqPrep.jar
编辑:显然我一次只能发送一个罐子给hadoop。这意味着我需要将我制作的类添加到mahout核心作业文件中:
~/mahout/trunk/core/target$ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup
~/mahout/trunk/core/target$ cp ~/workspace/seqprep/bin/org/help/SeqPrep.class .
~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class
然后:
~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep
Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
编辑好的,现在我可以在不弄乱我的hadoop安装的情况下完成。我在之前的编辑中更新了.jar错误。它应该是:
~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar org/help/SeqPrep.class
然后:
~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep
small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
答案 0 :(得分:11)
您需要使用Mahout提供的“job”JAR文件。它打包了所有依赖项。您还需要将类添加到其中。这就是所有Mahout示例的工作方式。你不应该把Mahout jar放在Hadoop lib中,因为那种在Hadoop中“安装”程序太深了。
答案 1 :(得分:7)
如果您将从https://github.com/tdunning/MiA存储库中获取示例代码,那么它包含可用于Maven的pom.xml
文件。当您使用mvn package
编译代码时,它将在mia-0.1-job.jar
目录中创建target
- 此存档包含除Hadoop之外的所有依赖项,因此您可以在Hadoop集群上运行它而不会出现问题< / p>
答案 2 :(得分:0)
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>0.7</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-collections</artifactId>
<version>1.0</version>
</dependency>
答案 3 :(得分:0)
我所做的是用我的jar和所有mahout jar文件设置HADOOP_CLASSPATH 如下图所示。
export HADOOP_CLASSPATH = / home / xxx / my.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4。 3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-core-0.7-cdh4.3.0-job.jar中:/ opt / Cloudera的/包裹/ CDH-4.3.0-1.cdh4.3.0.p0.22 / LIB /象夫/亨利马乌-例子-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4 .3.0.p0.22 / LIB /象夫/亨利马乌-例子-0.7-cdh4.3.0-job.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/象夫/象夫集成-0.7-cdh4.3.0.jar:/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/mahout/mahout-math-0.7-cdh4.3.0的.jar
然后我就能跑了 hadoop com.mycompany.mahout.CSVtoVector iris / nb / iris1.csv iris / nb / data / iris.seq
所以你必须在HADOOP_CLASSPATH中包含你的所有罐子和mahout罐子然后你可以用你的班级来运行你的班级
hadoop <classname>