地图任务停留在50%

时间:2015-04-28 21:59:08

标签: java hadoop mapreduce

我有一个mapper和reducer类,其输入和输出值如下所示。

//Reducer
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(MapperOutput.class);

//Mapper
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(MapperOutput.class);

此处MapperOutput是我定义的自定义类,它实现了Writable接口。

我的mapper函数的一部分如下所示。

public void map(LongWritable arg0, Text arg1,
        Context context)
        throws IOException 
{
    try
    {
        String tran = null;
        String ip = arg1.toString();
        System.out.println(ip);
        BufferedReader br = new BufferedReader(new StringReader(ip));
        Hsynopsis bdelta = null;
        Hsynopsis b = null, bnew = null;

        hashEntries = (int) Math.floor(calculateHashEntries()); //Hash table size
        System.out.println("Hash entries: "+hashEntries);

        //Initialize the main hash table and delta hashtable
        hashTable = new ArrayList<>(hashEntries);
        for(int i = 0; i < hashEntries; i++)
        {
            hashTable.add(i, null);
        }

        deltahashTable = new ArrayList<>(hashEntries);  
        for(int i = 0; i < hashEntries; i++)
        {
            deltahashTable.add(i, null);
        }

        while((tran = br.readLine())!=null)
        {
            createBinaryRep(tran);
            for(int i = 0; i < deltahashTable.size(); i++)
            {
                bdelta = deltahashTable.get(i);
                if(bdelta != null)
                {
                    if(bdelta.NLast_Access >= (alpha * transactionCount))
                    {
                        //Transmit bdelta to the coordinator
                        MapperOutput mp = new MapperOutput(transactionCount, bdelta);
                        context.write(new LongWritable(i), mp);

                        //Merge bdelta into b
                        b = hashTable.get(i);
                        bnew = merge(b,bdelta);
                        hashTable.set(i, bnew);

                        //Release bdelta
                        deltahashTable.set(i, null);
                    }
                }
            }
        }
    }
    catch(Exception e)
    {
        e.printStackTrace();
    }       
}

我的减速机任务如下。

public void reduce(LongWritable index, Iterator<MapperOutput> mpValues, Context context)
{
    while(mpValues.hasNext())
    {
        /*Some code here */
    }

    context.write(index, mp);
}

从mapper的代码中,正如算法所要求的那样,我试图在condition is satisfiedfor循环内部)和映射器写入之后将输出发送到reducer。 context继续执行循环。

当我尝试在单节点Hadoop集群上运行此代码时,我得到以下日志。

15/04/29 03:19:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/29 03:19:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/04/29 03:19:23 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
15/04/29 03:19:23 INFO input.FileInputFormat: Total input paths to process : 2
15/04/29 03:19:23 WARN snappy.LoadSnappy: Snappy native library not loaded
15/04/29 03:19:24 INFO mapred.JobClient: Running job: job_local599819429_0001
15/04/29 03:19:24 INFO mapred.LocalJobRunner: Waiting for map tasks
15/04/29 03:19:24 INFO mapred.LocalJobRunner: Starting task: attempt_local599819429_0001_m_000000_0
15/04/29 03:19:24 INFO util.ProcessTree: setsid exited with exit code     0
15/04/29 03:19:24 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@74ff364a
15/04/29 03:19:24 INFO mapred.MapTask: Processing split: file:/home/pooja/ADM/FrequentPatternMining/input/file.dat~:0+24
15/04/29 03:19:24 INFO mapred.MapTask: io.sort.mb = 100
15/04/29 03:19:24 INFO mapred.MapTask: data buffer = 79691776/99614720
15/04/29 03:19:24 INFO mapred.MapTask: record buffer = 262144/327680
15/04/29 03:19:24 INFO mapred.MapTask: Starting flush of map output
15/04/29 03:19:24 INFO mapred.MapTask: Starting flush of map output
15/04/29 03:19:25 INFO mapred.JobClient:  map 0% reduce 0%
15/04/29 03:19:30 INFO mapred.LocalJobRunner: 
15/04/29 03:19:31 INFO mapred.JobClient:  map 50% reduce 0%

地图任务已停留在50%并且没有继续。

当我单独运行map函数时(不在Hadoop中),我没有任何无限循环的问题。

任何人都可以请我这个吗?

编辑1:我的输入文件的顺序为KB。这是否会导致数据分发到映射器出现问题?

编辑2:如答案中所述,我将Iterator更改为Iterable。地图仍然停留在100%,一段时间后重新启动。

我可以在jobtracker日志中看到以下内容:

2015-04-29 13:26:28,026 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201504291300_0003_m_000000_0: Task attempt_201504291300_0003_m_000000_0 failed to report status for 600 seconds. Killing!
2015-04-29 13:26:28,026 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201504291300_0003_m_000000_0'

1 个答案:

答案 0 :(得分:0)

您在reduce函数中错误地使用了迭代器而不是迭代。

您需要使用iterable,因为您正在使用新的map reduce API,因为  reduce(Object,Iterable,org.apache.hadoop.mapreduce.Reducer.Context)

在排序的输入中为每个调用

方法。