Question

有没有办法用MapReduce生成排列？

输入文件：

1  title1
2  title2
3  title3

我的目标：

1,2  title1,title2
1,3  title1,title3
2,3  title2,title3

Answer 1

由于文件将有n个输入，因此排列应具有n^2个输出。有意义的是，您可以n个任务执行这些操作的n。我相信你可以这样做（假设只有一个文件）：

将您的输入文件放入DistributedCache，以便以只读方式访问Mapper / Reducers。在文件的每一行上进行输入拆分（如在WordCount中）。因此，映射器将接收一行（例如，在您的示例中为title1）。然后读取DistributedCache中文件的行并发出键/值对：键作为输入，值作为DistributedCache文件中的每一行。

在此模型中，您只需要一个Map步骤。

类似的东西：

  public static class PermuteMapper
       extends Mapper<Object, Text, Text, Text>{

    private static final IN_FILENAME="file.txt";

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {

      String inputLine = value.toString();

      // set the property mapred.cache.files in your
      // configuration for the file to be available
      Path[] cachedPaths = DistributedCache.getLocalCacheArchives(conf);
      if ( cachedPaths[0].getName().equals(IN_FILENAME) ) {
         // function defined elsewhere
         String[] cachedLines = getLinesFromPath(cachedPaths[0]);
         for (String line : cachedLines)
           context.emit(inputLine, line);
      }
    }
  }

MapReduce的排列

1 个答案: