Question

我在Mongo数据库中有500万个条目，如下所示：

{
    "_id" : ObjectId("525facace4b0c1f5e78753ea"),
    "productId" : null,
    "name" : "example name",
    "time" : ISODate("2013-10-17T09:23:56.131Z"),
    "type" : "hover",
    "url" : "www.example.com",
    "userAgent" : "curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 openssl/0.9.8r zlib/1.2.5"
}

我需要在每个条目中添加一个名为device的新字段，该字段的值为desktop或mobile。这意味着，目标是拥有以下类型的条目：

{
    "_id" : ObjectId("525facace4b0c1f5e78753ea"),
    "productId" : null,
    "device" : "desktop",
    "name" : "example name",
    "time" : ISODate("2013-10-17T09:23:56.131Z"),
    "type" : "hover",
    "url" : "www.example.com",
    "userAgent" : "curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 openssl/0.9.8r zlib/1.2.5"
}

我正在使用MongoDB Java驱动程序，到目前为止，我正在执行以下操作：

DBObject query = new BasicDBObject();
query.put("device", new BasicDBObject("$exists", false)); //some entries already have such field
DBCursor cursor = resource.find(query);
cursor.addOption(Bytes.QUERYOPTION_NOTIMEOUT);
Iterator<DBObject> iterator = cursor.iterator();
int size = cursor.count();

然后我用while(iterator.hasNext())进行迭代，用我在那里发现的一个巨大的正则表达式做一个if-else，并根据if-else的结果执行类似的事情：

BasicDBObject newDocument = new BasicDBObject("$set", new BasicDBObject().append("device", "desktop")); //of "mobile", depending on the if-else     
BasicDBObject searchQuery = new BasicDBObject("_id", id);               
resource.getCollection(DatabaseConfiguration.WEBSITE_STATISTICS).update(searchQuery, newDocument);

然而，由于大量数据（超过500万条目），这需要永远。

有没有办法用map reduce做到这一点？到目前为止，我只使用MapReduce进行计数，所以我不确定它是否可用于其他事项。

Answer 1

由于整个配置，我找到了一种有点棘手的方法。

在此link之后安装Hadoop后，我执行了以下操作：

创建了一个名为MongoUpdate的类，其方法为run，我在其中设置所有配置（如输入和输出URI）并创建作业并配置所有设置。其中，有job.setMapperClass(MongoMapper.class)
已创建MongoMapper，其中我的方法map获得BSONObject。在这里，我执行if-else条件，最后我执行：

Text id = new Text（pValue.get（“_ id”）。toString（））; pContext.write（id，new BSONWritable（pValue））;
类Main，其主要方法只是实例化MongoUpdate类并运行它run方法
导出包含所有库的jar并在终端上输入：hadoop java NameOfTheJar.jar

MapReduce MongoDB用户代理

1 个答案: