我在Mac上本地使用Spark。我的版本是2.2.1,我正在尝试使用Naive Bayes使用此链接-https://spark.apache.org/docs/2.2.1/ml-classification-regression.html#naive-bayes
复制分类示例为此,我无法加载示例数据
import org.apache.spark.ml.classification.NaiveBayes
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
// Load the data stored in LIBSVM format as a DataFrame.
val data = spark.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
以上代码会引发此错误-
org.apache.spark.sql.AnalysisException: Path does not exist: file:/Users/my_user_name/data/mllib/sample_libsvm_data.txt;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:626)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:349)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156)
... 50 elided
如何加载此数据以便继续进行进一步分析?
答案 0 :(得分:0)
您可以先将其加载到RDD中...
val textFile = sc.textFile("data/mllib/sample_libsvm_data.txt")
然后将其转换为如下所示的DataFrame(假设您知道架构)...
val df = textFile.toDF(dfSchema)