存储JavaPairRDD <string,map <string,=“”list <string =“”>&gt;分成多个文件

时间:2016-04-28 18:42:34

标签: java hadoop apache-spark

我正在使用Spark 1.6并尝试解决以下问题。 我有一个     JavaPairRDD<String, Map<String, List<String>>。 我想把它存储为多个输出文件,基于JavaPairRDD的Key作为外部目录,Map的键是文件名。

例如,如果JavaPairRDD具有以下数据

<"A", <{"A1",["a1","b1","c1"]}, {"A2",["a2","b2","c2"]}>>
<"B", <{"B1",["bb1","bb2","bb3"]}>

然后输出文件夹应如下

/output/A/A1 (content of A1 should have [a1,b1,c1])
/output/A/A2 (content of A2 should have [a2,b2,c2])
/output/B/B1 (content of B1 should have [bb1,bb2,bb3])

我有以下代码,但我不确定如何更改MultipleTextOutputFormat以迭代值Map。

public static void main(String a[]) {
            JavaPairRDD<String, Map<String, List<String>> pair;
            pair.saveAsHadoopFile(directory + "/output", String.class, Map.class,
                            RDDMultipleTextOutputFormat.class);
        }



public static class RDDMultipleTextOutputFormat<A, B> extends   MultipleTextOutputFormat<A, B> {
        @Override
        protected String generateFileNameForKeyValue(A key, B value, String name) {
            return key.toString(); // + "/" + name;
        }

        @Override
        protected B generateActualValue(A key, B value) {
            //return value;
            Map<String, List<String>> map = (HashMap<String, List<String>>)value;
            for(Map.Entry<String, List<String>>entry: map.entrySet()) {
                generateFileNameForKeyValue((A)(key.toString() + "/" + entry.getKey()), (B)(entry.getValue().toString()), entry.getKey());

            }

            //return value.saveAsHadoopFile((Map)value., String.class, Map.class,
            //  RDDMultipleTextOutputFormat.class);
       }

       @Override
           protected A generateActualKey(A key, B value) {
               return null;
           }

       /*@Override
           public RecordWriter<A, B> getRecordWriter(FileSystem fs, JobConf job, String name, Progressable prog) throws IOException {
               if (name.startsWith("apple")) {
                   return new TextOutputFormat<A, B>().getRecordWriter(fs, job, name, prog);
               } else if (name.startsWith("banana")) {
                   return new TextOutputFormat<A, B>().getRecordWriter(fs, job, name, prog);
               }
               return super.getRecordWriter(fs, job, name, prog);
           }*/
      }

非常感谢任何帮助。

由于 Akhila。

0 个答案:

没有答案