I have been using spark-1.5.2
built with hadoop-1.0.4
along with spark-csv_2.10-1.4.0
and commons-csv:1.1
(for reading data). I run naivebayes/randomforest
algorithms in CentOS 6.7 and it is working fine. When I upgraded the OS to 7.3, I get the following exception :
Caused by: java.lang.ClassNotFoundException: Failed to load class for data source: parquet.
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:67)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:167)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
at org.apache.spark.mllib.classification.NaiveBayesModel$SaveLoadV2_0$.save(NaiveBayes.scala:206)
at org.apache.spark.mllib.classification.NaiveBayesModel.save(NaiveBayes.scala:169)
at com.zlabs.ml.core.algo.classification.naivebayes.NaiveBayesClassificationImpl.createModel(NaiveBayesClassificationImpl.java:111)
... 10 more
This exception happens when I save the model. Training is completed successfully.
Is there anyone else facing this issue ? Are there any OS level package dependencies for spark?