将所有文件内容传递到地图中的地图功能减少并将其附加到序列文件

时间:2017-03-29 17:17:18

标签: hadoop mapreduce hadoop2 hadoop-partitioning sequencefile

我必须读取fileA的所有内容并将其传递给map函数。在map函数中,key是fileB,value是fileA的内容。 在outputFormat recordReader中,我使用序列文件writer append方法将所有值(FileA的所有内容)附加到fileB。 问题是

 1. I am loading all file contents in inputFormat recordReader and passing it to single map function.
 2. Appending all contents in sequence file.

PseudoCode:
InputFormat RecordReader:
@Override
  public boolean nextKeyValue() throws IOException, InterruptedException {

    if(flag>0)
      return false;

      flag++;

      String re=read all contents of file
      String key= k1;

      allRecords = new TextArrayWritable(Text.class, new Text[] {new Text(key),
                      new Text(re)});
      return true;
  }

@Override
  public TextArrayWritable getCurrentValue() throws IOException, InterruptedException {
    return allRecords;
  }

Map Function:

protected void map(Text key, TextArrayWritable value,
      Context context) throws IOException,
      InterruptedException {
    context.write(new Text(fileA path),value);
  }

OutputFormat RecordWriter:

@Override
    public void write(Text fileDir, TextArrayWritable contents) throws IOException,
        InterruptedException {
      SequenceFileWriter.append(contents.get()[0], contents.get()[1]);
}

如果文件大小太大,这两个操作都在内存操作中,可能会丢失内存错误。有没有办法避免将整个内容加载到内存中并能够将其附加到序列文件?

0 个答案:

没有答案