有些人可以帮我转换Cassandra行的数据,使用Java转换为Vector
其失败的代码段如下所示。
我正在从Cassandra表中读取数据并尝试创建如下的RandomForestClassifier模型。
JavaRDD<VibrationData1> vibrationReadRdd = CassandraJavaUtil
.javaFunctions(sc)
.cassandraTable("vibration_ks", "vibration_data_feature",
CassandraJavaUtil.mapRowTo(VibrationData1.class))
.select("classifier", "features");
System.out.println("vibrationRdd.count() --------- "
+ vibrationReadRdd.count());
SQLContext sqlContext = new SQLContext(sc);
DataFrame data = sqlContext.createDataFrame(vibrationReadRdd,
VibrationData1.class);
StringIndexerModel labelIndexer = new StringIndexer()
.setInputCol("classifier").setOutputCol("indexedLabel")
.fit(data);
// Automatically identify categorical features, and index them.
// Set maxCategories so features with > 4 distinct values are treated as
// continuous.
VectorIndexerModel featureIndexer = new VectorIndexer()
.setInputCol("features").setOutputCol("indexedFeatures")
.setMaxCategories(4).fit(data);
在行
处观察到错误VectorIndexerModel featureIndexer = new VectorIndexer().setInputCol("features"). setOutputCol("indexedFeatures").setMaxCategories(4).fit(data);
以下错误消息
17/01/18 13:15:49 INFO DAGScheduler:作业2完成:StringIndexer.scala中的countByValue:86,花了3.331207 s 线程“main”中的异常java.lang.IllegalArgumentException:要求失败:列要素必须是org.apache.spark.mllib.linalg.VectorUDT@f71b0bce类型,但实际上是StringType。 在scala.Predef $ .require(Predef.scala:233) at org.apache.spark.ml.util.SchemaUtils $ .checkColumnType(SchemaUtils.scala:42) 在org.apache.spark.ml.feature.VectorIndexer.transformSchema(VectorIndexer.scala:134) 在org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:68) 在org.apache.spark.ml.feature.VectorIndexer.fit(VectorIndexer.scala:112) 在com.edge.database.test.VibrationDBUtil.createSchema(VibrationDBUtil.java:161) 在com.edge.database.test.VibrationDBUtil.run(VibrationDBUtil.java:95) 在com.edge.database.test.VibrationDBUtil.main(VibrationDBUtil.java:217) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) 在org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:731) 在org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181) 在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206) 在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121) 在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
cassandra表如下
cqlsh:vibration_ks&GT; select * from vibration_data_feature;
uid |分类器|功能
- - - - + ------------ + ---------------------------- -------------
15 | 0 | 10:0.0092 11:0.0093 20:0.0094 21:0.0095
17 | 1 | 10:0.192 11:0.3093 20:0.8094 21:0.9095
提前致谢。