我正在使用谷歌mapreduce lib来处理我的数据。在处理数据时,计数器可用于映射器函数。但我不知道如何在最终方法中得到计数器结果。
def mapper(obj):
yield obj
yield operation.counters.Increment("process-obj")
class Test(base_handler.PipelineBase):
"""A pipeline to ingest log as CSV in Google Storage
"""
def run(self, setting_id):
filepath = yield mapreduce_pipeline.MapperPipeline(
"test",
"mapper",
"mapreduce.input_readers.DatastoreInputReader",
output_writer_spec="mapreduce.output_writers.FileOutputWriter",
params={
},
shards=10
)
def finalized(self):
# how to read the counter process-obj
# how to get the setting_id
pass
答案 0 :(得分:2)
命名输出可能就是你要找的东西。您可以找到更多详细信息here。
以下是使用命名输出的代码,以获取各种计数器,包括您定义的计数器:
def mapper(obj):
yield obj
yield operation.counters.Increment("process-obj")
class Test(base_handler.PipelineBase):
"""A pipeline to ingest log as CSV in Google Storage
"""
output_names = ['counters']
def run(self, setting_id):
results = yield mapreduce_pipeline.MapperPipeline(
"test",
"mapper",
"mapreduce.input_readers.DatastoreInputReader",
output_writer_spec="mapreduce.output_writers.FileOutputWriter",
params={
},
shards=10
)
yield MapreduceResult(results.counters)
def finalized(self):
print 'Counters here: ', self.outputs.counters
class MapreduceResult(base_handler.PipelineBase):
def run(self, counters):
self.fill(self.outputs.counters, counters)