Can I do multi class classification with apache spark Support Vector Machines?

时间:2016-12-02 05:23:18

标签: machine-learning svm apache-spark-mllib

When I tried Apache Spark SVM for a multi class classification problem, I got following error. Can someone explain me whether there is a way to do SVM multi class classification using Apache Spark MLlib

Exception in thread "main" org.apache.spark.SparkException: Input validation failed.
    at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:251)
    at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:229)
    at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:219)
    at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:255)
    at org.apache.spark.mllib.classification.SVMWithSGD.train(SVM.scala)
    at SVMClass.main(SVMClass.java:31)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

1 个答案:

答案 0 :(得分:2)

并非每个ML算法都可以开箱即用地处理多类问题。如果是这种情况,您可以随时使用one vs. rest策略。 Wiki文章解释了哪些算法具有" natural"扩展到多类。

如果您检查Spark docs,您会注意到二进制文件中提到了SVM。 SVM算法需要扩展来处理多个类,显然它可能没有在MLlib中实现(从文档判断)。您可以使用上面提到的OvR策略解决这个问题,但您的表现不会很出色。多层感知器是一个有趣的替代方案,因为它也可以提供属于给定类的概率。