Spark ClassCastException:JavaRDD无法强制转换为org.apache.spark.mllib.linalg.Vector

时间:2016-10-13 09:00:36

标签: java scala apache-spark

我想使用Java实现ARIMA时间序列。 我有以下Scala代码:

object SingleSeriesARIMA {
    def main(args: Array[String]): Unit = {
    // The dataset is sampled from an ARIMA(1, 0, 1) model generated in R.
    val lines = scala.io.Source.fromFile("../data/R_ARIMA_DataSet1.csv").getLines()
    val ts = Vectors.dense(lines.map(_.toDouble).toArray)
    val arimaModel = ARIMA.fitModel(1, 0, 1, ts)
    println("coefficients: " + arimaModel.coefficients.mkString(","))
    val forecast = arimaModel.forecast(ts, 20)
    println("forecast of next 20 observations: " + forecast.toArray.mkString(",")) 
    }
}

我尝试过以下解决方案:

public class JavaARIMA {

public static void main(String args[])
        {
    System.setProperty("hadoop.home.dir", "C:/winutils");  
    SparkConf conf = new SparkConf().setAppName("Spark-TS Ticker Example").setMaster("local").set("spark.sql.warehouse.dir", "file:///C:/Users/devanshi/Downloads/Spark/sparkdemo/spark-warehouse/");
    JavaSparkContext context = new JavaSparkContext(conf);

    JavaRDD<String> lines = context.textFile("path/inputfile");

    JavaRDD<Vector> ts = lines.map(
              new Function<String, Vector>() {
                public Vector call(String s) {
                  String[] sarray = s.split(",");
                  double[] values = new double[sarray.length];
                  for (int i = 0; i < sarray.length; i++) {
                    values[i] = Double.parseDouble(sarray[i]);
                  }
                  return Vectors.dense(values);
                }
              }
            );
    double[] total = {1.0,0.0,1.0};
    //DenseVector dv = new DenseVector(total);
    //convert(dv,toBreeze());
   //ARIMAModel arimaModel = ARIMA.fitModel(1, 0, 1, dv, true, "css-cgd", null);
    ARIMAModel arimaModel = ARIMA.fitModel(1, 0, 1, (Vector) ts, false, "css-cgd", total);

   //  arimaModel = ARIMA.fitModel(1, 0, 1, ts);
    System.out.println("coefficients: " + arimaModel.coefficients()); 
    Vector forcst = arimaModel.forecast((Vector) ts,20);
    System.out.println("forecast of next 20 observations: " + forcst);
}
}

但我得到了:

Exception in thread "main" java.lang.ClassCastException:
org.apache.spark.api.java.JavaRDD cannot be cast to
org.apache.spark.mllib.linalg.Vector

请尽可能帮助我。

1 个答案:

答案 0 :(得分:1)

您不能将转换JavaRDD键入Vector,而是需要使用rdd.foreach来获取单个Vector。所以代码可能是这样的。

ts.foreach(new VoidFunction<Vector>() {
    @Override
    public void call(Vector v) throws Exception {
        double[] total = { 1.0, 0.0, 1.0 };
        ARIMAModel arimaModel = ARIMA.fitModel(1, 0, 1, (Vector) v, false, "css-cgd", total);

        System.out.println("coefficients: " + arimaModel.coefficients());
        Vector forcst = arimaModel.forecast((Vector) v, 20);
        System.out.println("forecast of next 20 observations: " + forcst);
    }
});

希望这会有所帮助......