如何在Spark中将HashMap转换为JavaPairRDD?

时间:2015-07-25 16:24:53

标签: java apache-spark

我是Apache Spark的新手。我正在尝试从JavaPairRdd创建HashMap。我的HashMap类型<String,<Integer,Integer>> 如何将其转换为JavaPairRdd?我在下面粘贴了我的代码:

HashMap<String, HashMap<Integer,String>> canlist =
    new HashMap<String, HashMap<Integer,String>>();

for (String key : entityKey)
{
    HashMap<Integer, String> clkey = new HashMap<Integer, String>();
    int f=0;
    for (String val :mentionKey)
    {
        //do something
        simiscore = (longerLength - costs[m.length()]) / (double) longerLength;

        if (simiscore > 0.6) {
            clkey.put(v1,val);
            System.out.print(
                " The mention  " + val + " added to link entity  " + key);
            }
            f++;
            System.out.println("Scan Completed");
    }
    canlist.put(key,clkey);
    JavaPairRDD<String, HashMap<Integer, String>> rad;
    rad = context.parallelize(scala.collection.Seq(toScalaMap(canlist)));

}
public static <String,Object> Map<String,Object> toScalaMap(HashMap<String,Object> m) {
    return (Map<String,Object>) JavaConverters.mapAsScalaMapConverter(m).asScala().toMap(
            Predef.<Tuple2<String,Object>>conforms()
    );}
}

3 个答案:

答案 0 :(得分:8)

如果您将HashMap转换为List<scala.Tuple2<Integer, String>>,则可以使用JavaSparkContext.parallelizePairs

答案 1 :(得分:1)

这是将java HashMap<String, HashMap<Integer,String>>转换为List<Tuple2<String, HashMap<Integer,String>>>并传递给JavaSparkContext的parallelizePairs()方法的另一种方法。

import scala.Tuple2;

List<Tuple2<String, HashMap<Integer,String>>> list = new ArrayList<Tuple2<String, HashMap<Integer,String>>>();      
for(Map.Entry<String, HashMap<Integer,String>> entry : canlist.entrySet()){
    list1.add(new Tuple2<String, HashMap<Integer,String>>(entry.getKey(),entry.getValue()));
  }

JavaPairRDD<String, HashMap<Integer, String>> javaPairRdd = jsc.parallelizePairs(list);

答案 2 :(得分:0)

转换的通用方法的代码段。将JavaSparkContext.parallelizePairs()与此方法的结果一起使用。

    //fromMapToListTuple2() generic method to convert Map<T1, T2> to List<Tuple2<T1, T2>>
    public static <T1, T2> List<Tuple2<T1, T2>> fromMapToListTuple2(Map<T1, T2> map)
    {
        //list of tuples
        List<Tuple2<T1, T2>> list = new ArrayList<Tuple2<T1, T2>>();

        //loop through all key-value pairs add them to the list
        for(T1 key : map.keySet())
        {
            //get the value
            T2 value = map.get(key);

            //Tuple2 is not like a traditional Java collection, but a single k-v pair;
            Tuple2<T1, T2> tuple2 = new Tuple2<T1, T2>(key, value);

            //populate the list with created tupple2
            list.add(tuple2);
        } // for

        return list;
    } // fromMapToListTuple2