I have a mongo documents like this:
db.activity_days.findOne()
{
"_id" : ObjectId("54b4ee617acf9ce0440a3185"),
"aca" : 0,
"ca" : 0,
"cbdw" : true,
"day" : ISODate("2014-12-10T00:00:00Z"),
"dm" : 0,
"fbc" : 0,
"go" : 2500,
"gs" : [ ],
"its" : [
{
"_id" : ObjectId("551ac8d44f9f322e2b055d3a"),
"at" : 2000,
"atn" : "Running",
"cas" : 386.514909469507,
"dis" : 2.788989730832084,
"du" : 1472,
"ibr" : false,
"ide" : false,
"lcs" : false,
"pt" : 0,
"rpt" : 0,
"src" : 1001,
"stp" : 0,
"tcs" : [ ],
"ts" : 1418257729,
"u_at" : ISODate("2015-01-13T00:32:10.954Z")
}
],
"po" : 0,
"se" : 0,
"st" : 0,
"tap3c" : [ ],
"tzo" : -21600,
"u_at" : ISODate("2015-01-13T00:32:10.952Z"),
"uid" : ObjectId("545eb753ae9237b1df115649")
}
I want to use pig to filter special _id range,I can write mongo query like this:
db.activity_day.find(_id:{$gt:ObjectId("54a48e000000000000000000"),$lt:ObjectId("54cd6c800000000000000000")})
But I don't know how to write in pig, anyone knows?
答案 0 :(得分:0)
您可以尝试将mongo-hadoop
连接器用于Pig,请参阅mongo-hadoop: Usage with Pig。
一旦REGISTER
JAR(核心,猪和Java驱动程序),例如,REGISTER /path-to/mongo-hadoop-pig-<version>.jar;
通过grunt,您就可以运行:
SET mongo.input.query '{"_id":{"\$gt":{"\$oid":"54a48e000000000000000000},"\$lt":{"\$oid":"54cd6c800000000000000000}}}'
rangeActivityDay = LOAD 'mongodb://localhost:27017/database.collection' USING com.mongodb.hadoop.pig.MongoLoader()
DUMP rangeActivityDay
在转储数据之前,您可能还想使用LIMIT。
以上测试使用:mongo-java-driver-3.0.0-rc1.jar
,mongo-hadoop-pig-1.4.0.jar
,mongo-hadoop-core-1.4.0.jar
和MongoDB v3.0.9