如何在MapReduce中设置输出编写器

时间:2011-09-09 15:09:52

标签: google-app-engine mapreduce

我正在从(http://code.google.com/p/appengine-mapreduce/)尝试mapreduce框架并稍微修改了演示应用程序(使用mapreduce.input_readers.DatastoreInputReader而不是mapreduce.input_readers.BlobstoreZipInputReader)。

我已经设置了2个管道类:

class IndexPipeline(base_handler.PipelineBase):
def run(self):
    output = yield mapreduce_pipeline.MapreducePipeline(
        "index",
        "main.index_map", #added higher up in code
        "main.index_reduce", #added higher up in code
        "mapreduce.input_readers.DatastoreInputReader",
        mapper_params={
            "entity_kind": "model.SearchRecords",
        },
        shards=16)
    yield StoreOutput("Index", output)

class StoreOutput(base_handler.PipelineBase):
    def run(self, mr_type, encoded_key):
        logging.info("output is %s %s" % (mr_type, str(encoded_key)))
        if encoded_key:
            key = db.Key(encoded=encoded_key)
            m = db.get(key)

            yield op.db.Put(m)

运行它:

pipeline = IndexPipeline()
pipeline.start()

但我一直收到这个错误:

Handler yielded two: ['a'] , but no output writer is set.

我试图找到source中某处设置输出编写器的地方但没有成功。我发现的唯一事情是应该在某处设置output_writer_class

有谁知道如何设置它?

在旁注中,encoded_key中的StoreOutput参数似乎总是为无。

1 个答案:

答案 0 :(得分:0)

输出编写器必须定义为mapreduce_pipeline.MapreducePipeline的参数(参见docstring):

class MapreducePipeline(base_handler.PipelineBase):
  """Pipeline to execute MapReduce jobs.

  Args:
    job_name: job name as string.
    mapper_spec: specification of mapper to use.
    reducer_spec: specification of reducer to use.
    input_reader_spec: specification of input reader to read data from.
    output_writer_spec: specification of output writer to save reduce output to.**
    mapper_params: parameters to use for mapper phase.
    reducer_params: parameters to use for reduce phase.
    shards: number of shards to use as int.
    combiner_spec: Optional. Specification of a combine function. If not
      supplied, no combine step will take place. The combine function takes a
      key, list of values and list of previously combined results. It yields
      combined values that might be processed by another combiner call, but will
      eventually end up in reducer. The combiner output key is assumed to be the
      same as the input key.

  Returns:
    filenames from output writer.
  """