Spark流转换功能

时间:2015-06-15 23:08:30

标签: java apache-spark spark-streaming

我在火花流的transform函数中遇到了编译错误。 特别是似乎缺少最终确定DStream变量或类似的东西。我从amplab教程中复制了一些有点混乱......

以下是代码,问题出在transform函数的最后。

这是错误:

[ERROR] /home/nipun/ngla-stable/online/src/main/java/org/necla/ngla/spark_streaming/Type4ViolationChecker.java:[120,63] error:
 no suitable method found for transform(<anonymous Function<JavaPairRDD<Long,Integer>,JavaPairRDD<Long,Integer>>>)
[INFO] 1 error

代码:

public class Type4ViolationChecker {

    private static final Pattern NEWSPACE = Pattern.compile("\n");

    public static Long generateTSKey(String line) throws ParseException{

        JSONObject obj = new JSONObject(line);
        String time = obj.getString("mts");
        DateFormat formatter = new SimpleDateFormat("yyyy / MM / dd HH : mm : ss");
        Date date = (Date)formatter.parse(time);

        long since = date.getTime();
        long key = (long)(since/10000) * 10000;

        return key;
    }

    public static void main(String[] args) {

        Type4ViolationChecker obj = new Type4ViolationChecker();

        SparkConf sparkConf = new SparkConf().setAppName("Type4ViolationChecker");
        final JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(10000));

        JavaReceiverInputDStream<String> lines = ssc.socketTextStream(args[0], Integer.parseInt(args[1]), StorageLevels.MEMORY_AND_DISK_SER);

        JavaDStream<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public Iterable<String> call(String x) {
                return Lists.newArrayList(NEWSPACE.split(x));
            }
        });

        words.persist();

        JavaDStream<String> matched = words.filter(new Function<String, Boolean>() {
            public Boolean call(String line) {
                return line.contains("pattern");
            }});

        JavaPairDStream<Long, Integer> keyValStream = matched.mapToPair(
                new PairFunction<String, Long, Integer>(){

                    /**
                     * Here we are converting the string to a key value tuple
                     * Key -> time bucket calculated using the 1970 GMT date as anchor, and dividing by the polling interval
                     * Value -> is the original message
                     */
                    @Override
                    public Tuple2<Long, Integer> call(String arg0)
                            throws Exception {
                        // TODO Auto-generated method stub
                        return new Tuple2<Long,Integer>(generateTSKey(arg0),1);
                    }

                });

        JavaPairDStream<Long, Integer> tsStream = keyValStream.reduceByKey(
                new Function2<Integer,Integer,Integer>(){
                    public Integer call(Integer i1, Integer i2){
                        return i1+ i2;
                    }});

        JavaPairDStream<Long,Integer> sortedtsStream = tsStream.transform(
                new Function<JavaPairRDD<Long, Integer>, JavaPairRDD<Long,Integer>>() {

                    @Override
                    public JavaPairRDD<Long, Integer> call(JavaPairRDD<Long, Integer> longIntegerJavaPairRDD) throws Exception {
                        return longIntegerJavaPairRDD.sortByKey(false);
                    }
                });

        //sortedtsStream.print();

        ssc.start();
        ssc.awaitTermination();

    }
}

1 个答案:

答案 0 :(得分:0)

感谢@GaborBakos提供答案...... 以下似乎工作!不得不使用transformtoPair,而不是转换

    JavaPairDStream<Long,Integer> sortedtsStream = tsStream.transformToPair(
            new Function<JavaPairRDD<Long, Integer>, JavaPairRDD<Long,Integer>>() {
                @Override
                public JavaPairRDD<Long, Integer> call(JavaPairRDD<Long, Integer> longIntegerJavaPairRDD) throws Exception {
                    return longIntegerJavaPairRDD.sortByKey(true);
                }
            });