Spark Cassandra:java.lang.IllegalArgumentException:要求失败:列要素的类型必须为org.apache.spark.mllib.linalg.VectorUDT

时间:2017-01-18 07:53:10

标签: apache-spark

有些人可以帮我转换Cassandra行的数据,使用Java转换为Vector

其失败的代码段如下所示。

我正在从Cassandra表中读取数据并尝试创建如下的RandomForestClassifier模型。

    JavaRDD<VibrationData1> vibrationReadRdd = CassandraJavaUtil
            .javaFunctions(sc)
            .cassandraTable("vibration_ks", "vibration_data_feature",
                    CassandraJavaUtil.mapRowTo(VibrationData1.class))
            .select("classifier", "features");

    System.out.println("vibrationRdd.count()   --------- "
            + vibrationReadRdd.count());

    SQLContext sqlContext = new SQLContext(sc);

    DataFrame data = sqlContext.createDataFrame(vibrationReadRdd,
            VibrationData1.class);

    StringIndexerModel labelIndexer = new StringIndexer()
            .setInputCol("classifier").setOutputCol("indexedLabel")
            .fit(data);
    // Automatically identify categorical features, and index them.
    // Set maxCategories so features with > 4 distinct values are treated as
    // continuous.
    VectorIndexerModel featureIndexer = new VectorIndexer()
            .setInputCol("features").setOutputCol("indexedFeatures")
            .setMaxCategories(4).fit(data);

在行

处观察到错误
VectorIndexerModel featureIndexer = new  VectorIndexer().setInputCol("features"). setOutputCol("indexedFeatures").setMaxCategories(4).fit(data);

以下错误消息

17/01/18 13:15:49 INFO DAGScheduler:作业2完成:StringIndexer.scala中的countByValue:86,花了3.331207 s 线程“main”中的异常java.lang.IllegalArgumentException:要求失败:列要素必须是org.apache.spark.mllib.linalg.VectorUDT@f71b0bce类型,但实际上是StringType。         在scala.Predef $ .require(Predef.scala:233)         at org.apache.spark.ml.util.SchemaUtils $ .checkColumnType(SchemaUtils.scala:42)         在org.apache.spark.ml.feature.VectorIndexer.transformSchema(VectorIndexer.scala:134)         在org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:68)         在org.apache.spark.ml.feature.VectorIndexer.fit(VectorIndexer.scala:112)         在com.edge.database.test.VibrationDBUtil.createSchema(VibrationDBUtil.java:161)         在com.edge.database.test.VibrationDBUtil.run(VibrationDBUtil.java:95)         在com.edge.database.test.VibrationDBUtil.main(VibrationDBUtil.java:217)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:498)         在org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:731)         在org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181)         在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206)         在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121)         在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

cassandra表如下

cqlsh:vibration_ks&GT; select * from vibration_data_feature;

uid |分类器|功能

- - - - + ------------ + ---------------------------- -------------

15 | 0 | 10:0.0092 11:0.0093 20:0.0094 21:0.0095

17 | 1 | 10:0.192 11:0.3093 20:0.8094 21:0.9095

提前致谢。

0 个答案:

没有答案