将mongoDB的所有集合作为输入传递给mapreduce hadoop

时间:2013-12-24 08:03:38

标签: mongodb hadoop mapreduce mongo-collection

我需要将MongoDB中我数据库的所有集合作为Hadoop MR作业的输入传递。有一种方法允许多个输入:

MultiCollectionSplitBuilder mcsb = new MultiCollectionSplitBuilder();
mcsb.add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
        (MongoURI)null, // authuri
        true, // notimeout
        (DBObject)null, // fields
        (DBObject)null, // sort
        (DBObject)null, // query
        false,
        MultiMongoCollectionSplitter.class)
.add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
        (MongoURI)null, // authuri
        true, // notimeout
        (DBObject)null, // fields
        (DBObject)null, // sort
        new BasicDBObject("_id", new BasicDBObject("$gt", new Date(883440000000L))),
        false, // range query
        MultiMongoCollectionSplitter.class);

但是我的数据库中有10个集合。上述方法仅允许2个集合争论。 我需要做的就是单独使用mapper方法中的所有集合。我的减速机对所有这些都是一样的。

感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

您可以继续添加到MultiCollectionSplitBuilder

    MultiCollectionSplitBuilder mcsb = new MultiCollectionSplitBuilder();
    mcsb
            .add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
                    (MongoURI) null, // authuri
                    true, // notimeout
                    (DBObject) null, // fields
                    (DBObject) null, // sort
                    (DBObject) null, // query
                    false,
                    MultiMongoCollectionSplitter.class
            )
            .add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
                    (MongoURI) null, // authuri
                    true, // notimeout
                    (DBObject) null, // fields
                    (DBObject) null, // sort
                    new BasicDBObject("_id", new BasicDBObject("$gt", new Date(883440000000L))),
                    false, // range query
                    MultiMongoCollectionSplitter.class
            )
            .add(new MongoURI("mongodb://localhost:27017/mongo_hadoop.yield_historical.in"),
                    (MongoURI) null, // authuri
                    true, // notimeout
                    (DBObject) null, // fields
                    (DBObject) null, // sort
                    new BasicDBObject("_id", new BasicDBObject("$gt", new Date(883440000000L))),
                    false, // range query
                    MultiMongoCollectionSplitter.class
            )
    ;