使用MapReduce获取指定学位的朋友

时间:2013-07-25 20:43:33

标签: hadoop mapreduce social-networking graph-theory hadoop-streaming

您知道如何使用MapReduce范例实现此算法吗?

def getFriends(self, degree):
    friendList = []
    self._getFriends(degree, friendList)
    return friendList

def _getFriends(self, degree, friendList):
    friendList.append(self)
    if degree:
        for friend in self.friends:
            friend._getFriends(degree-1, friendList)

假设我们有以下双向友谊:

(1,2),(1,3),(1,4),(4,5),(4,6),(5,7),(5,8)

例如,如何获得用户1的第一,第二和第三度连接?答案必须是1 - > 2,3,4,5,7,8

由于

3 个答案:

答案 0 :(得分:0)

也许你可以使用支持类似sql的查询的hive!

答案 1 :(得分:0)

据我了解,您希望在社交图中收集某人的第n个圈中的所有朋友。大多数图算法都是递归的,并且递归不适合MapReduce解决任务的方式。

我建议你使用Apache Giraph来解决这个问题(实际上它使用了MapReduce)。它主要是异步的,你编写的工作描述了单个节点的行为,如:

1. Send a message from root node to all friends to get their friendlist.
2.1. Each friend sends a message with friendlist to root node.
2.2. Each friend sends a message to all it's sub-friends to get their friendlist.
3.1. Each sub-friend sends a message with friendlist to root node.
3.2. Each sub-friend sends a message to all it's sub-sub-friends to get their friendlist.
...
N. Root node collects all these messages and merges them in a single list.

此外,您可以使用级联的map-reduce作业来收集圈子,但这不是解决任务的有效方法:

  1. 将root用户好友导出到文件circle-001
  2. 使用circle-001作为工作的输入,将每个用户朋友从circle-001导出到circle-002
  3. 执行相同操作,但使用circle-002作为输入
  4. ...
  5. 重复N次
  6. 如果您有很多用户计算他们的圈子,第一种方法更合适。第二个是启动多个MR作业的巨大开销,但它更简单,适用于小型输入用户组。

答案 2 :(得分:0)

我是这个领域的新手,但这是我的。

您可以按照以下伪代码使用传统的BFS算法。

在每次迭代中,您都会启动一个Hadoop作业,该作业会发现当前工作集中尚未访问过的所有子节点。

BFS (list curNodes, list visited, int depth){
    if (depth <= 0){
        return visited;
    }

    //run Hadoop job on the current working set curNodes restricted by visited

    //the job will populate some result list with the list of child nodes of the current working set

    //then,

    visited.addAll(result);
    curNodes.empty();
    curNodes.addAll(result);

    BFS(curNodes, visited, depth-1);
}

此作业的映射器和缩减器将如下所示。

在这个例子中,我只使用静态成员来保存工作集,访问集和结果集。

它应该是使用临时文件实现的。可能有一些方法可以优化从一次迭代到下一次迭代累积的临时数据的持久性。

我用于作业的输入文件包含每行翻倒的翻译列表,例如 1,2 2,3 5,4 ... ...

  public static class VertexMapper extends
      Mapper<Object, Text, IntWritable, IntWritable> {

    private static Set<IntWritable> curVertex = null;
    private static IntWritable curLevel = null;
    private static Set<IntWritable> visited = null;

    private IntWritable key = new IntWritable();
    private IntWritable value = new IntWritable();

    public void map(Object key, Text value, Context context)
        throws IOException, InterruptedException {

      StringTokenizer itr = new StringTokenizer(value.toString(), ",");
      if (itr.countTokens() == 2) {
        String keyStr = itr.nextToken();
        String valueStr = itr.nextToken();
        try {
          this.key.set(Integer.parseInt(keyStr));
          this.value.set(Integer.parseInt(valueStr));

          if (VertexMapper.curVertex.contains(this.key)
              && !VertexMapper.visited.contains(this.value)
              && !key.equals(value)) {
            context.write(VertexMapper.curLevel, this.value);
          }
        } catch (NumberFormatException e) {
          System.err.println("Found key,value <" + keyStr + "," + valueStr
              + "> which cannot be parsed as int");
        }
      } else {
        System.err.println("Found malformed line: " + value.toString());
      }
    }
  }

  public static class UniqueReducer extends
      Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {

    private static Set<IntWritable> result = new HashSet<IntWritable>();

    public void reduce(IntWritable key, Iterable<IntWritable> values,
        Context context) throws IOException, InterruptedException {

      for (IntWritable val : values) {
        UniqueReducer.result.add(new IntWritable(val.get()));
      }
      // context.write(key, key);
    }
  }

运行工作就像那样

UniqueReducer.result.clear();
VertexMapper.curLevel = new IntWritable(1);
VertexMapper.curVertex = new HashSet<IntWritable>(1);
VertexMapper.curVertex.add(new IntWritable(1));
VertexMapper.visited = new HashSet<IntWritable>(1);
VertexMapper.visited.add(new IntWritable(1));

Configuration conf = getConf();
Job job = new Job(conf, "BFS");
job.setJarByClass(BFSExample.class);
job.setMapperClass(VertexMapper.class);
job.setCombinerClass(UniqueReducer.class);
job.setReducerClass(UniqueReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
job.setOutputFormatClass(NullOutputFormat.class);
boolean result = job.waitForCompletion(true);

BFSExample bfs = new BFSExample();
ToolRunner.run(new Configuration(), bfs, args);