Question

我想要的是什么：

设计具有

的MapReduce函数

输入：

key1 \ t A1 \ t B1

key2 \ t A2 \ t B2

key3 \ t A3 \ t B3

...

key30 \ t A30 \ t B30

期望的输出：

＆＃34;分钟＆＃34; \ t交叉（A1，A2，A3 ......）

＆＃34; MAX＆＃34; \ t交叉（B1，B2，B3，...）

其中A1，A2，A3，.. B1，B2，B3是集合。

我做了什么：

我设计Mapper和Reducer如下。

映射器：

 public static class MapAPP extends Mapper<Text, Text, Text, Text>{     

    public static int j=0,k=0;
    public static List<String> min_pre = new ArrayList<>();
    public static List<String> min_current = new ArrayList<>();
    public static Set<String> min_p1 = new HashSet<>();
    public static Set<String> min_c1 = new HashSet<>();
    public static List<String> min_result = new ArrayList<>(); 
    public static Boolean no_exist_min=false;

    public static List<String> max_pre = new ArrayList<>();
    public static List<String> max_current = new ArrayList<>();
    public static Set<String> max_p1 = new HashSet<>();
    public static Set<String> max_c1 = new HashSet<>();
    public static List<String> max_result = new ArrayList<>(); 
    public static Boolean no_exist_max=false;

    public void map(Text key, Text value, Context con) throws IOException, InterruptedException
    {
        String[] v=value.toString().split("\t");
        // aggregate min
        if (no_exist_min==false){
            if (j==0){
                    min_pre= Arrays.asList(v[0].toString().trim().split("\\|"));
                    j=1;
                 }else{
                    min_current= Arrays.asList(v[0].toString().trim().split("\\|")); 
                    for (String p: min_pre){                   
                       min_p1 = new HashSet<String>(Arrays.asList(p.split(",")));
                       for (String c: min_current){
                           min_c1 = new HashSet<String>(Arrays.asList(c.split(",")));
                           min_c1.retainAll(min_p1);
                           if (!min_c1.isEmpty()){
                               Joiner m_comma = Joiner.on(",").skipNulls();
                               String buff = m_comma.join(min_c1);
                               if (!min_result.contains(buff))
                                    min_result.add(buff);
                           }                       
                       }                   
                    }
                    if (min_result.isEmpty()){
                        no_exist_min=true;          
                    } else {                    
                        min_pre=new ArrayList(min_result);
                        min_result.clear();                       
                    }
            }                   
        }

        //aggregate max
        if (no_exist_max==false){
            if (k==0){
                    max_pre= Arrays.asList(v[1].toString().trim().split("\\|"));
                    k=1;
                 }else{
                    max_current= Arrays.asList(v[1].toString().trim().split("\\|")); 
                    for (String p: max_pre){                   
                       max_p1 = new HashSet<String>(Arrays.asList(p.split(",")));
                       for (String c: max_current){
                           max_c1 = new HashSet<String>(Arrays.asList(c.split(",")));
                           max_c1.retainAll(max_p1);
                           if (!max_c1.isEmpty()){
                               Joiner m_comma = Joiner.on(",").skipNulls();
                               String buff = m_comma.join(max_c1);
                               if (!max_result.contains(buff))
                                    max_result.add(buff);
                           }                       
                       }                   
                    }
                    if (max_result.isEmpty()){
                        no_exist_max=true;          
                    } else {                    
                        max_pre=new ArrayList(max_result);
                        max_result.clear();                       
                    }
            }                   
        }

    }

    protected void cleanup(Context con) throws IOException, InterruptedException {
        Joiner m_pipe = Joiner.on("|").skipNulls();
        if (no_exist_min==true){
            con.write(new Text("min"), new Text("no_exist"));
        }else {               
            String min_str = m_pipe.join(min_pre);
            con.write(new Text("min"), new Text(min_str)); 

        }

        if (no_exist_max==true){
            con.write(new Text("max"), new Text("no_exist"));
        }else {
            String max_str = m_pipe.join(max_pre);                
            con.write(new Text("max"), new Text(max_str));                
        }            
        min_p1.clear();
        min_c1.clear();
        min_result.clear();

        max_p1.clear();
        max_c1.clear();
        max_result.clear();
    }
}

减速机：

public static class ReduceAPP extends Reducer<Text, Text, Text, Text>
{
    public void reduce(Text key, Iterable<Text> values, Context con) throws IOException, InterruptedException
    {
        List<String> pre = new ArrayList<>();
        List<String> current = new ArrayList<>();
        Set<String> p1 = new HashSet<>();
        Set<String> c1 = new HashSet<>();
        List<String> result = new ArrayList<>();
        Joiner comma = Joiner.on(",").skipNulls(); 
        Joiner pipe = Joiner.on("|").skipNulls(); 
        Boolean no_exist=false;
        String preStr="";
        int i=0;
        // aggregate
        for(Text value: values){
             if (value.toString().trim()=="no_exist"){
                 no_exist=true;
                 break;
                }
             if (i==0){
                    pre= Arrays.asList(value.toString().trim().split("\\|"));
                    i=1;
             }else{
                    current= Arrays.asList(value.toString().trim().split("\\|")); 
                    for (String p: pre){                   
                       p1 = new HashSet<String>(Arrays.asList(p.split(",")));
                       for (String c: current){
                           c1 = new HashSet<String>(Arrays.asList(c.split(",")));
                           c1.retainAll(p1);
                           if (!c1.isEmpty()){
                               String buff = comma.join(c1);
                               if (!result.contains(buff))
                                    result.add(buff);
                           }                       
                       }                   
                    }
                    if (result.isEmpty()){
                        no_exist=true;
                        break;
                    }
                    pre=new ArrayList(result);
                    result.clear();                       
             }                   

        }
        if (no_exist==true){
            con.write(key, new Text("no_exist"));
        }
        else{
            preStr = pipe.join(pre);
            con.write(key, new Text(preStr)); 
        }
        System.out.println("Reducefinished: key="+key.toString()+", value= "+preStr);
    }
    public static <T> Set<T> union(Set<T> setA, Set<T> setB) {
        Set<T> tmp = new TreeSet<T>(setA);
        tmp.addAll(setB);
        return tmp;
    }
}

下面的代码看起来很复杂，但这个想法非常简单。为了计算交点（A1，A2，A3），我首先计算A12 =交点（A1，A2），然后计算A13 =交叉（A12，A3），依此类推，直到结束。我做同样的交叉（B1，B2，B3）。实际上，我的Mapper和Reduce代码几乎是一样的。不同之处在于我输出的内容：在每个映射器的清理功能中，同时在我的Reducer的reduce函数中。

让你轻松想象：

我的映射器输出：

Mapper1：

min \ t A16 =相交（A1，A2，.. A6）

max \ t B16 =相交（B1，B2，.. B6）

Mapper2：

min \ t A715 =相交（A7，A8，.. A15）

max \ t B715 =相交（B7，B8，.. B15）

Mapper3：

min \ t A1622 =相交（A16，A17，.. A22）

max \ t B1622 =相交（B16，B17，.. B22）

Mapper4：

min \ t A2330 =相交（A23，A24，.. A30）

max \ t B2330 =相交（B23，B24，.. B30）

我的减速机输出：

Reducer1：min \ t A130 =相交（A16，A715，A1622，A2330）

Reducer2：max \ t B130 =相交（B16，B715，B1622，B2330）

问题

由于A中的数量较少，因此很容易计算出Reducer1。但是因为B的大小>＆gt; A的大小，所以Reducer2需要花费很多时间，有时我会遇到堆溢出错误。我该怎么做才能让我的Reducer2过程更快？

非常感谢你。

MapReduce函数查找集合

0 个答案: