如何在apache spark中解决以下问题

时间:2017-11-27 10:25:08

标签: java apache-spark

考虑一个零售场景,其中(K,V)输入数组包含(产品名称,价格),如下所示。每个密钥的价值需要使用500减去折扣优惠

使用Spark逻辑来实现上述要求,

输入

{(Jeans,2000),(Smart phone,10000),(Watch,3000)}

预期输出enter code here

{(Jeans,1500),(Smart phone,9500),(Watch,2500)}

我已经尝试了以下代码我遇到错误请帮我修复它们     import java.util.Arrays;     import java.util.Iterator;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

public class PairRDDAgg {

public static void main(String[] args) {
    // TODO Auto-generated method stub

    SparkConf conf = new 
SparkConf().setAppName("Line_Count").setMaster("local");
    JavaSparkContext sc = new JavaSparkContext(conf);

    JavaRDD<String> input = 
sc.textFile("C:/Users/xxxx/Documents/retail.txt");



    JavaPairRDD<String, Integer> counts = input.mapValues(new Function() {
        /**
         * 
         */
        private static final long serialVersionUID = 1L;

        public Integer call(Integer i) {
            return (i-500);
        }


    });

    System.out.println(counts.collect());

    sc.close();
}
}

2 个答案:

答案 0 :(得分:1)

使用mapValues()函数

您的方案的示例是

rdd.mapValues(x => x-500);

答案 1 :(得分:0)

你可以试试这个:

scala> val dataset = spark.createDataset(Seq(("Jeans",2000),("Smart phone",10000),("Watch",3000)))
dataset: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> dataset.map ( x => (x._1, x._2 - 500) ).show

 +-----------+----+
 |         _1|  _2|
 +-----------+----+
 |      Jeans|1500|
 |Smart phone|9500| 
 |      Watch|2500|
 +-----------+----+