富有表现力的方式在Python中组合生成器

时间:2018-01-12 19:07:08

标签: python generator function-composition

我真的很喜欢Python生成器。特别是,我发现它们只是连接到Rest端点的正确工具 - 我的客户端代码只需要迭代连接端点的生成器。但是,我发现Python的发生器并不像我想的那样富有表现力。通常,我需要过滤从端点获取的数据。在我当前的代码中,我将谓词函数传递给生成器,它将谓词应用于它正在处理的数据,并且只有谓词为True时才生成数据。

我想转向生成器的组合 - 比如 data_filter(datasource())。这是一些演示代码,显示了我尝试过的内容。很明显为什么它不起作用,我想弄清楚的是什么是最有表现力的解决方案:

# Mock of Rest Endpoint: In actual code, generator is 
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external" 
def data_filter (d):
    if len(d) < 8:
        yield d

# First Try:
# for w in data_filter(mock_datasource()):
#     print(w)
# >> TypeError: object of type 'generator' has no len()

# Second Try 
# for w in (data_filter(d) for d in mock_datasource()):
#     print(w)
# I don't get words out, 
# rather <generator object data_filter at 0x101106a40>

# Using a predicate to filter works, but is not the expressive 
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
    print(w)

4 个答案:

答案 0 :(得分:4)

data_filter应该len应用d元素而不是d本身,如下所示:

def data_filter (d):
    for x in d:
        if len(x) < 8:
            yield x

现在你的代码:

for w in data_filter(mock_datasource()):
    print(w)

返回

liberty
seminar
formula
comedy

答案 1 :(得分:1)

更简洁地说,您可以直接使用生成器表达式执行此操作:

def length_filter(d, minlen=0, maxlen=8):
    return (x for x in d if minlen <= len(x) < maxlen)

将过滤器应用于您的生成器,就像常规函数一样:

for element in length_filter(endpoint_data()):
    ...

如果您的谓词非常简单,内置函数filter也可能满足您的需求。

答案 2 :(得分:0)

您可以传递适用于每个项目的过滤器功能:

def mock_datasource(filter_function):
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
             "formula","short-circuit", "generate", "comedy"]

    for d in mock_data:
        yield filter_function(d)

def filter_function(d):
    # filter
    return filtered_data

答案 3 :(得分:0)

我要做的是定义filter(data_filter)来接收生成器作为输入,并返回带有由data_filter谓词(常规谓词,不知道生成器接口)过滤的值的生成器。

代码是:

def filter(pred):
    """Filter, for composition with generators that take coll as an argument."""
    def generator(coll):
        for x in coll:
            if pred(x):
                yield x
    return generator

def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

def data_filter (d):
    if len(d) < 8:
        return True


gen1 = mock_datasource()
filtering = filter(data_filter)
gen2 = filtering(gen1) # or filter(data_filter)(mock_datasource())

print(list(gen2)) 

如果您想进一步改进,可以使用compose,这是我的全部意图:

from functools import reduce

def compose(*fns):
    """Compose functions left to right - allows generators to compose with same
    order as Clojure style transducers in first argument to transduce."""
    return reduce(lambda f,g: lambda *x, **kw: g(f(*x, **kw)), fns)

gen_factory = compose(mock_datasource, 
                      filter(data_filter))
gen = gen_factory()

print(list(gen))

PS:我使用了发现的here代码,Clojure的人员在此表达了发电机的组成,灵感来自于发电机与换能器的一般组成方式。 PS2:filter可以用更Python化的方式编写:

def filter(pred):
    """Filter, for composition with generators that take coll as an argument."""
    return lambda coll: (x for x in coll if pred(x))