我正在尝试将Spark 2.3.1与Java一起使用。
我遵循了示例in the documentation,但在调用.fit(trainingData)
时却得到了描述不佳的异常。
Exception in thread "main" java.lang.IllegalArgumentException
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2299)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2073)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1358)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
at org.apache.spark.ml.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:112)
at org.apache.spark.ml.tree.impl.RandomForest$.run(RandomForest.scala:105)
at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:116)
at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:45)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
at com.example.spark.MyApp.main(MyApp.java:36)
我将此虚拟数据集进行了分类(data.csv
):
f,label
1,1
1.5,1
0,0
2,2
2.5,2
我的代码:
SparkSession spark = SparkSession.builder()
.master("local[1]")
.appName("My App")
.getOrCreate();
Dataset<Row> data = spark.read().format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("C:\\tmp\\data.csv");
data.show(); // see output(1) below
VectorAssembler assembler = new VectorAssembler()
.setInputCols(Collections.singletonList("f").toArray(new String[0]))
.setOutputCol("features");
Dataset<Row> trainingData = assembler.transform(data)
.select("features", "label");
trainingData.show(); // see output(2) below
DecisionTreeClassifier clf = new DecisionTreeClassifier();
DecisionTreeClassificationModel model = clf.fit(trainingData); // fails here (MyApp.java:36)
Dataset<Row> predictions = model.transform(trainingData);
predictions.show(); // never reached
输出(1):
+---+-----+
| f|label|
+---+-----+
|1.0| 1|
|1.5| 1|
|0.0| 0|
|2.0| 2|
|2.5| 2|
+---+-----+
输出(2):
+--------+-----+
|features|label|
+--------+-----+
| [1.0]| 1|
| [1.5]| 1|
| [0.0]| 0|
| [2.0]| 2|
| [2.5]| 2|
+--------+-----+
我的build.gradle
文件如下:
plugins {
id 'java'
id 'application'
}
group 'com.example'
version '1.0-SNAPSHOT'
sourceCompatibility = 1.8
mainClassName = 'MyApp'
repositories {
mavenCentral()
}
dependencies {
compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.3.1'
compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.3.1'
compile group: 'org.apache.spark', name: 'spark-mllib_2.11', version: '2.3.1'
}
我想念什么?
答案 0 :(得分:11)
您已在计算机上下载了哪个Java版本? 您的问题可能与Java 9有关。
如果下载Java 8(例如jdk-8u171),则该异常将消失,predictions.show()
的output(3)如下所示:
+--------+-----+-------------+-------------+----------+
|features|label|rawPrediction| probability|prediction|
+--------+-----+-------------+-------------+----------+
| [1.0]| 1|[0.0,2.0,0.0]|[0.0,1.0,0.0]| 1.0|
| [1.5]| 1|[0.0,2.0,0.0]|[0.0,1.0,0.0]| 1.0|
| [0.0]| 0|[1.0,0.0,0.0]|[1.0,0.0,0.0]| 0.0|
| [2.0]| 2|[0.0,0.0,2.0]|[0.0,0.0,1.0]| 2.0|
| [2.5]| 2|[0.0,0.0,2.0]|[0.0,0.0,1.0]| 2.0|
+--------+-----+-------------+-------------+----------+
答案 1 :(得分:3)
我有同样的问题,我的系统使用带有Java 8的Spark 2.2.0,现在我们想升级服务器,但是Spark 2.3.1尚不支持Java 10,就我而言,我继续工作在Spark Server中使用Java 8并仅将Spark升级到2.3.1
我阅读了有关主题的文章:
答案 2 :(得分:0)
对于那些安装了 java 8 但仍然出现此异常的人,您应该检查您的 eclipse 指向哪个 java 版本。 在你的 Eclipse IDE 中转到
Windows > 首选项 > Java > 已安装的 JRE
现在检查它是否指向 jre 1.8 如果没有,则单击 Add > Standard VM > Next >(提供您的 jre 目录)> Finish