我正在尝试将csv文件作为JavaRDD字符串加载,然后想要在JavaRDD Vector中获取数据
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.mllib.feature.HashingTF;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.stat.MultivariateStatisticalSummary;
import org.apache.spark.mllib.stat.Statistics;
import breeze.collection.mutable.SparseArray;
import scala.collection.immutable.Seq;
public class Trial {
public void start() throws InstantiationException, IllegalAccessException,
ClassNotFoundException {
run();
}
private void run(){
SparkConf conf = new SparkConf().setAppName("csvparser");
JavaSparkContext jsc = new JavaSparkContext(conf);
JavaRDD<String> data = jsc.textFile("C:/Users/kalraa2/Documents/trial.csv");
JavaRDD<Vector> datamain = data.flatMap(null);
MultivariateStatisticalSummary mat = Statistics.colStats(datamain.rdd());
System.out.println(mat.mean());
}
private List<Vector> Seq(Vector dv) {
// TODO Auto-generated method stub
return null;
}
public static void main(String[] args) throws Exception {
Trial trial = new Trial();
trial.start();
}
}
程序正在运行,没有任何错误,但是当我尝试在spark-machine上运行它时,我无法得到任何东西。谁能告诉我字符串RDD到Vector RDD的转换是否正确。
我的csv文件只包含一个浮点数列
答案 0 :(得分:1)
此null
调用中的flatMap
可能是个问题:
JavaRDD<Vector> datamain = data.flatMap(null);
答案 1 :(得分:0)
我通过将代码更改为此
来解决了我的问题JavaRDD<Vector> datamain = data.map(new Function<String,Vector>(){
public Vector call(String s){
String[] sarray = s.trim().split("\\r?\\n");
double[] values = new double[sarray.length];
for (int i = 0; i < sarray.length; i++) {
values[i] = Double.parseDouble(sarray[i]);
System.out.println(values[i]);
}
return Vectors.dense(values);
}
}
);
答案 2 :(得分:0)
假设您的<?php //index.php
require_once 'userrepository.php'
$repository = new UserRepository();
$repository->init();
文件看起来像这样
trial.csv
从您的问题中获取原始代码,Java 8需要进行一行更改
1.0
2.0
3.0
打印SparkConf conf = new SparkConf().setAppName("csvparser").setMaster("local");
JavaSparkContext jsc = new JavaSparkContext(conf);
JavaRDD<String> data = jsc.textFile("C:/Users/kalraa2/Documents/trial.csv");
JavaRDD<Vector> datamain = data.map(s -> Vectors.dense(Double.parseDouble(s)));
MultivariateStatisticalSummary mat = Statistics.colStats(datamain.rdd());
System.out.println(mat.mean());