Question

我正在尝试使用。\ bin \ hadoop jar path_to_mahout_jar等运行Mahout

仅当输入是本地文件时才有效。当我尝试使用Hadoop文件系统中的文件时，会出现此错误：

Exception in thread "main" java.io.FileNotFoundException: input (The system cannot find the file specified)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:120)
        at org.apache.mahout.classifier.sgd.TrainLogistic.open(TrainLogistic.java:316)
        at org.apache.mahout.classifier.sgd.TrainLogistic.mainToOutput(TrainLogistic.java:75)
        at org.apache.mahout.classifier.sgd.TrainLogistic.main(TrainLogistic.java:64)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

但是，当我查看HDFS时，我可以看到该文件。

Answer 1

奇怪的是，对于我来说，mahout正在hdfs中寻找导演中的文件，要在我的本地文件系统中制作mahout，我必须提供一个文件：/// URI。也许你应该像Sean为你的问题所建议的那样尝试hdfs：// URI。

Answer 2

无法在HDFS上运行Trainlogistic算法（以及其他一些分类算法）。

选中此link，表示只能在单台计算机上运行。

祝你好运..！

Answer 3

如果您在本地工作，则可以使用java.io，但如果您使用HDFS，则必须使用hadoop.io操作。也许以下链接可以帮助您：

https://sites.google.com/site/hadoopandhive/home/how-to-read-all-files-in-a-directory-in-hdfs-using-hadoop-filesystem-api

https://sites.google.com/site/hadoopandhive/home/how-to-write-a-file-in-hdfs-using-hadoop

https://sites.google.com/site/hadoopandhive/home/hadoop-how-to-read-a-file-from-hdfs

使用Hadoop FileNotFoundError运行Mahout

3 个答案: