How TO Filter by _id in mongodb using pig

时间:2016-02-03 03:19:13

标签: mongodb hadoop apache-pig

I have a mongo documents like this:

db.activity_days.findOne()
{
    "_id" : ObjectId("54b4ee617acf9ce0440a3185"),
    "aca" : 0,
    "ca" : 0,
    "cbdw" : true,
    "day" : ISODate("2014-12-10T00:00:00Z"),
    "dm" : 0,
    "fbc" : 0,
    "go" : 2500,
    "gs" : [ ],
    "its" : [
        {
            "_id" : ObjectId("551ac8d44f9f322e2b055d3a"),
            "at" : 2000,
            "atn" : "Running",
            "cas" : 386.514909469507,
            "dis" : 2.788989730832084,
            "du" : 1472,
            "ibr" : false,
            "ide" : false,
            "lcs" : false,
            "pt" : 0,
            "rpt" : 0,
            "src" : 1001,
            "stp" : 0,
            "tcs" : [ ],
            "ts" : 1418257729,
            "u_at" : ISODate("2015-01-13T00:32:10.954Z")
        }
    ],
    "po" : 0,
    "se" : 0,
    "st" : 0,
    "tap3c" : [ ],
    "tzo" : -21600,
    "u_at" : ISODate("2015-01-13T00:32:10.952Z"),
    "uid" : ObjectId("545eb753ae9237b1df115649")
}

I want to use pig to filter special _id range,I can write mongo query like this:

db.activity_day.find(_id:{$gt:ObjectId("54a48e000000000000000000"),$lt:ObjectId("54cd6c800000000000000000")})

But I don't know how to write in pig, anyone knows?

1 个答案:

答案 0 :(得分:0)

您可以尝试将mongo-hadoop连接器用于Pig,请参阅mongo-hadoop: Usage with Pig

一旦REGISTER JAR(核心,猪和Java驱动程序),例如,REGISTER /path-to/mongo-hadoop-pig-<version>.jar;通过grunt,您就可以运行:

SET mongo.input.query '{"_id":{"\$gt":{"\$oid":"54a48e000000000000000000},"\$lt":{"\$oid":"54cd6c800000000000000000}}}'
rangeActivityDay = LOAD 'mongodb://localhost:27017/database.collection' USING com.mongodb.hadoop.pig.MongoLoader()
DUMP rangeActivityDay

在转储数据之前,您可能还想使用LIMIT

以上测试使用:mongo-java-driver-3.0.0-rc1.jarmongo-hadoop-pig-1.4.0.jarmongo-hadoop-core-1.4.0.jarMongoDB v3.0.9