应用错误收集

我尝试使用ParallelALSFactorizationJob，但在此处崩溃：

Exception in thread "main" java.lang.NullPointerException at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) at org.apache.hadoop.util.Shell.runCommand(Shell.java:445) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)

命令行帮助中提到使用文件系统，但似乎需要hadoop。如何在Windows上运行mahout.cmd文件已损坏：

"===============DEPRECATION WARNING===============" "This script is no longer supported for new drivers as of Mahout 0.10.0" "Mahout's bash script is supported and if someone wants to contribute a fix for this" "it would be appreciated."

那有可能吗（ALS + Windows-hadoop）？

Mahout是一个社区驱动的项目，其社区非常强大。

“ Apache Mahout是第一台也是最杰出的大数据机器之一学习平台。它在顶部实现机器学习算法 Hadoop和Spark等分布式处理平台。”

-Tiwary，C.（2015年）。 Learning Apache Mahout 。

Apache Spark是一个开源的内存中通用计算系统，可在Windows和Unix之类的系统上运行。 Spark使用群集内存将所有数据上传到内存中，而不是像基于Hadoop的基于磁盘的计算一样，可以重复查询这些数据。

“随着Spark在数据科学家中越来越流行，Mahout 社区也正在迅速致力于使Mahout算法发挥作用在Spark的执行引擎上将其计算速度提高10到100 倍快。 Mahout为使用Spark创建推荐。”

-Gupta，A（2015）。 Learning Apache Mahout Classification 。

（这最后一本书还提供了逐步指南使用Mahout的Spark shell （它们不使用Windows，也不清楚它们是否使用Hadoop）。有关更多信息有关该主题，请参见https://mahout.apache.org/users/sparkbindings/play-with-shell.html的“实现”部分。）

除此之外，您还可以使用Spark构建推荐引擎，例如Spark MLlib和

中提供的DataFrames，RDD，Pipelines和Transforms

在Spark中，（...）交替最小二乘（ALS）方法用于生成基于模型的协作过滤。

-Gorakala，S.（2016年）。 Building Recommendation Engines 。

这时，在回答您的问题之前，还有一个问题要回答：can we run Spark without Hadoop?。

因此，是，可以在使用Spark的Windows上使用ALS方法（不使用Hadoop）。

apache mahout ALS能否在没有Hadoop的情况下工作？

1 个答案: