示例代码：

Question

在R中（感谢dplyr），您现在可以通过%>%使用功能更强大的管道语法执行操作。这意味着不是编码：

> as.Date("2014-01-01")
> as.character((sqrt(12)^2)

你也可以这样做：

> "2014-01-01" %>% as.Date 
> 12 %>% sqrt %>% .^2 %>% as.character

对我而言，这更具可读性，这扩展到数据框之外的用例。 python语言是否支持类似的东西？

Answer 1

这样做的一种可能方法是使用名为macropy的模块。 Macropy允许您将转换应用于您编写的代码。因此a | b可以转换为b(a)。这有许多优点和缺点。

与Sylvain Leroux提到的解决方案相比，主要优点是您不需要为您感兴趣的函数创建中缀对象 - 只需标记您打算使用转换的代码区域。其次，由于转换是在编译时而不是运行时应用的，因此转换后的代码在运行时不会产生任何开销 - 所有工作都是在首次从源代码生成字节代码时完成的。

主要缺点是macropy需要激活某种方式才能工作（稍后提到）。与更快的运行时相比，源代码的解析在计算上更复杂，因此程序将花费更长的时间来启动。最后，它增加了一种语法风格，这意味着不熟悉macropy的程序员可能会发现你的代码更难理解。

示例代码：

<强> run.py

import macropy.activate 
# Activates macropy, modules using macropy cannot be imported before this statement
# in the program.
import target
# import the module using macropy

<强> target.py

from fpipe import macros, fpipe
from macropy.quick_lambda import macros, f
# The `from module import macros, ...` must be used for macropy to know which 
# macros it should apply to your code.
# Here two macros have been imported `fpipe`, which does what you want
# and `f` which provides a quicker way to write lambdas.

from math import sqrt

# Using the fpipe macro in a single expression.
# The code between the square braces is interpreted as - str(sqrt(12))
print fpipe[12 | sqrt | str] # prints 3.46410161514

# using a decorator
# All code within the function is examined for `x | y` constructs.
x = 1 # global variable
@fpipe
def sum_range_then_square():
    "expected value (1 + 2 + 3)**2 -> 36"
    y = 4 # local variable
    return range(x, y) | sum | f[_**2]
    # `f[_**2]` is macropy syntax for -- `lambda x: x**2`, which would also work here

print sum_range_then_square() # prints 36

# using a with block.
# same as a decorator, but for limited blocks.
with fpipe:
    print range(4) | sum # prints 6
    print 'a b c' | f[_.split()] # prints ['a', 'b', 'c']

最后是努力工作的模块。我将其称为fpipe for function pipe作为其模拟shell语法，用于将输出从一个进程传递到另一个进程。

<强> fpipe.py

from macropy.core.macros import *
from macropy.core.quotes import macros, q, ast

macros = Macros()

@macros.decorator
@macros.block
@macros.expr
def fpipe(tree, **kw):

    @Walker
    def pipe_search(tree, stop, **kw):
        """Search code for bitwise or operators and transform `a | b` to `b(a)`."""
        if isinstance(tree, BinOp) and isinstance(tree.op, BitOr):
            operand = tree.left
            function = tree.right
            newtree = q[ast[function](ast[operand])]
            return newtree

    return pipe_search.recurse(tree)

Answer 2

管道是Pandas 0.16.2中的新功能。

示例：

import pandas as pd
from sklearn.datasets import load_iris

x = load_iris()
x = pd.DataFrame(x.data, columns=x.feature_names)

def remove_units(df):
    df.columns = pd.Index(map(lambda x: x.replace(" (cm)", ""), df.columns))
    return df

def length_times_width(df):
    df['sepal length*width'] = df['sepal length'] * df['sepal width']
    df['petal length*width'] = df['petal length'] * df['petal width']

x.pipe(remove_units).pipe(length_times_width)
x

注意：Pandas版本保留了Python的引用语义。这就是length_times_width不需要返回值的原因;它会修改x。

Answer 3

python语言是否支持类似的东西？

＆＃34;更多功能性的管道语法＆＃34; 这真的是一个更具功能性的＆＃34;句法？我会说它增加了一个＆＃34;中缀＆＃34;语法改为R。

话虽如此，Python's grammar并没有直接支持标准运算符之外的中缀表示法。

如果你确实需要这样的东西，你应该以that code from Tomer Filiba为出发点来实现你自己的中缀表示法：

^{Tomer Filiba的代码示例和评论（http://tomerfiliba.com/blog/Infix-Operators/）：}
from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        return self.func(other)
    def __ror__(self, other):
        return Infix(partial(self.func, other))
    def __call__(self, v1, v2):
        return self.func(v1, v2)
使用这个特殊类的实例，我们现在可以使用新的＆＃34;语法＆＃34; 用于调用函数作为中缀运算符：
>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6

Answer 4

PyToolz [doc]允许任意组合的管道，只是它们没有用管道操作符语法定义。

按照上面的链接进行快速入门。这是一个视频教程： http://pyvideo.org/video/2858/functional-programming-in-python-with-pytoolz

In [1]: from toolz import pipe

In [2]: from math import sqrt

In [3]: pipe(12, sqrt, str)
Out[3]: '3.4641016151377544'

Answer 5

如果您只想将其用于个人脚本，则可能需要考虑使用Coconut而不是Python。

Coconut是Python的超集。因此，您可以使用椰子管道操作员|>，同时完全忽略椰子语言的其余部分。

例如：

def addone(x):
    x + 1

3 |> addone

编译到

# lots of auto-generated header junk

# Compiled Coconut: -----------------------------------------------------------

def addone(x):
    return x + 1

(addone)(3)

Answer 6

不需要第三方库，也不需要混淆操作员的技巧来实现管道功能-您可以自己轻松地掌握基础知识。

让我们先定义管道函数实际上是什么。从本质上讲，这只是一种以逻辑顺序而不是标准的“由内而外”顺序来表达一系列函数调用的方法。

例如，让我们看一下这些功能：

def one(value):
  return value

def two(value):
  return 2*value

def three(value):
  return 3*value

不是很有趣，但是假设有趣的事情正在value上发生。我们想按顺序调用它们，将每个输出传递给下一个。在香草python中，应该是：

result = three(two(one(1)))

它的可读性不佳，对于更复杂的管道，它会变得更糟。因此，这是一个带有初始参数的简单管道函数，以及将其应用到的一系列函数：

def pipe(first, *args):
  for fn in args:
    first = fn(first)
  return first

我们称之为：

result = pipe(1, one, two, three)

对我来说，这看起来像是易读的“管道”语法：)。我没有看到它比重载运算符或类似的东西可读性更差。实际上，我认为这是更易读的 python 代码

以下是解决OP示例的简陋管道：

from math import sqrt
from datetime import datetime

def as_date(s):
  return datetime.strptime(s, '%Y-%m-%d')

def as_character(value):
  # Do whatever as.character does
  return value

pipe("2014-01-01", as_date)
pipe(12, sqrt, lambda x: x**2, as_character)

Answer 7

使用pipe

构建Infix

正如Sylvain Leroux所示，我们可以使用Infix运算符构建中缀pipe。让我们看看这是如何实现的。

首先，这是来自Tomer Filiba

的代码

^{Tomer Filiba的代码示例和评论（http://tomerfiliba.com/blog/Infix-Operators/）：}
from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
    def __or__(self, other):
        return self.func(other)
    def __ror__(self, other):
        return Infix(partial(self.func, other))
    def __call__(self, v1, v2):
        return self.func(v1, v2)
使用这个特殊类的实例，我们现在可以使用新的“语法” 用于调用函数作为中缀运算符：
>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6

管道运算符将前一个对象作为参数传递给管道后面的对象，因此x %>% f可以转换为f(x)。因此，可以使用pipe定义Infix运算符，如下所示：

In [1]: @Infix
   ...: def pipe(x, f):
   ...:     return f(x)
   ...:
   ...:

In [2]: from math import sqrt

In [3]: 12 |pipe| sqrt |pipe| str
Out[3]: '3.4641016151377544'

关于部分申请的说明

来自%>%的{{1}}运算符通过函数中的第一个参数推送参数，所以

dpylr

对应

df %>% 
filter(x >= 2) %>%
mutate(y = 2*x)

在Python中实现类似功能的最简单方法是使用currying。 df1 <- filter(df, x >= 2) df2 <- mutate(df1, y = 2*x)库提供了toolz装饰器功能，可以轻松构建curried函数。

curry

请注意In [2]: from toolz import curry In [3]: from datetime import datetime In [4]: @curry def asDate(format, date_string): return datetime.strptime(date_string, format) ...: ...: In [5]: "2014-01-01" |pipe| asDate("%Y-%m-%d") Out[5]: datetime.datetime(2014, 1, 1, 0, 0)将参数推送到最后一个参数位置 ，即

|pipe|

对应

x |pipe| f(2)

在设计curried函数时，应该在参数列表的前面放置静态参数（即可能用于许多示例的参数）。

请注意，f(2, x)包含许多预先计算的函数，包括toolz模块中的各种函数。

operator

大致对应于R

中的以下内容

In [11]: from toolz.curried import map

In [12]: from toolz.curried.operator import add

In [13]: range(5) |pipe| map(add(2)) |pipe| list
Out[13]: [2, 3, 4, 5, 6]

使用其他中缀分隔符

您可以通过覆盖其他Python运算符方法来更改围绕Infix调用的符号。例如，将> library(dplyr) > add2 <- function(x) {x + 2} > 0:4 %>% sapply(add2) [1] 2 3 4 5 6和__or__切换为__ror__和__mod__会将__rmod__运算符更改为|运算符。

mod

Answer 8

我错过了Elixir的|>管道运算符，所以我创建了一个简单的函数装饰器（大约50行代码），在编译时将>> Python右移运算符重新解释为类似Elixir的管道时间使用ast库和compile / exec：

from pipeop import pipes

def add3(a, b, c):
    return a + b + c

def times(a, b):
    return a * b

@pipes
def calc()
    print 1 >> add3(2, 3) >> times(4)  # prints 24

它所做的就是将a >> b(...)重写为b(a, ...)。

https://pypi.org/project/pipeop/

https://github.com/robinhilliard/pipes

Answer 9

添加我的2c。我个人使用包fn进行功能样式编程。您的示例转换为

from fn import F, _
from math import sqrt

(F(sqrt) >> _**2 >> str)(12)

F是一个包装类，具有用于部分应用和组合的功能样式语法糖。 _是匿名函数的Scala样式构造函数（类似于Python＆＃39; s lambda）;它表示一个变量，因此您可以在一个表达式中组合多个_对象以获得具有更多参数的函数（例如_ + _等同于lambda a, b: a + b）。 F(sqrt) >> _**2 >> str会生成一个Callable对象，可以根据需要多次使用。

Answer 10

您可以使用sspipe库。它公开了两个对象# work w/o problems try: subprocess.check_output("rsync -ae 'ssh -q' /tmp/hello*.txt machine:/tmp", timeout=20, shell=True) except subprocess.TimeoutExpired as e: print(e) # fail try: args = shlex.split("rsync -ae 'ssh -q' /tmp/hello*.txt machine:/tmp") subprocess.check_output(args, timeout=20) except subprocess.TimeoutExpired as e: print(e) CalledProcessError: Command '['rsync', '-ae', 'ssh -q', '/tmp/hello*.txt', 'machine:/tmp']' returned non-zero exit status 23和p。类似于px，您可以编写x %>% f(y,z)，类似于x | p(f, y, z)，您可以编写x %>% .^2。

x | px**2

Answer 11

一种替代解决方案是使用工作流程工具dask。虽然在语法上不如...

var
| do this
| then do that

...它仍然允许您的变量沿链向下流动，并且使用dask在可能的情况下提供了并行化的更多好处。

这是我使用dask来完成管道链模式的方法：

import dask

def a(foo):
    return foo + 1
def b(foo):
    return foo / 2
def c(foo,bar):
    return foo + bar

# pattern = 'name_of_behavior': (method_to_call, variables_to_pass_in, variables_can_be_task_names)
workflow = {'a_task':(a,1),
            'b_task':(b,'a_task',),
            'c_task':(c,99,'b_task'),}

#dask.visualize(workflow) #visualization available. 

dask.get(workflow,'c_task')

# returns 100

使用elixir后，我想在Python中使用管道模式。这不是完全相同的模式，但是类似，就像我说的那样，还带来了并行化的更多好处。如果您告诉dask在您的工作流程中获得一个不依赖其他人先运行的任务，则它们将并行运行。

如果您想要更简单的语法，可以将其包装在可以为您命名任务的内容中。当然，在这种情况下，您将需要所有函数将管道作为第一个参数，并且您将失去任何并行化的好处。但是，如果您认为可以，可以执行以下操作：

def dask_pipe(initial_var, functions_args):
    '''
    call the dask_pipe with an init_var, and a list of functions
    workflow, last_task = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})
    workflow, last_task = dask_pipe(initial_var, [function_1, function_2])
    dask.get(workflow, last_task)
    '''
    workflow = {}
    if isinstance(functions_args, list):
        for ix, function in enumerate(functions_args):
            if ix == 0:
                workflow['task_' + str(ix)] = (function, initial_var)
            else:
                workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1))
        return workflow, 'task_' + str(ix)
    elif isinstance(functions_args, dict):
        for ix, (function, args) in enumerate(functions_args.items()):
            if ix == 0:
                workflow['task_' + str(ix)] = (function, initial_var)
            else:
                workflow['task_' + str(ix)] = (function, 'task_' + str(ix - 1), *args )
        return workflow, 'task_' + str(ix)

# piped functions
def foo(df):
    return df[['a','b']]
def bar(df, s1, s2):
    return df.columns.tolist() + [s1, s2]
def baz(df):
    return df.columns.tolist()

# setup 
import dask
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]})

现在，使用此包装器，您可以按照以下两种语法模式制作管道：

# wf, lt = dask_pipe(initial_var, [function_1, function_2])
# wf, lt = dask_pipe(initial_var, {function_1:[], function_2:[arg1, arg2]})

像这样：

# test 1 - lists for functions only:
workflow, last_task =  dask_pipe(df, [foo, baz])
print(dask.get(workflow, last_task)) # returns ['a','b']

# test 2 - dictionary for args:
workflow, last_task = dask_pipe(df, {foo:[], bar:['string1', 'string2']})
print(dask.get(workflow, last_task)) # returns ['a','b','string1','string2']

Answer 12

https://pypi.org/project/pipe/的pipe模块非常好超载|运算符，并提供许多add, first, where, tail等管道功能。

>>> [1, 2, 3, 4] | where(lambda x: x % 2 == 0) | add
6

>>> sum([1, [2, 3], 4] | traverse)
10

此外，编写自己的管道函数非常容易

@Pipe
def p_sqrt(x):
    return sqrt(x)

@Pipe
def p_pr(x):
    print(x)

9 | p_sqrt | p_pr

Answer 13

只需使用 cool。

首先，运行 python -m pip install cool。然后，运行 python。

from cool import F

range(10) | F(filter, lambda x: x % 2) | F(sum) == 25

您可以阅读https://github.com/abersheeran/cool以获得更多用法。

Answer 14

有dfply模块。您可以在以下位置找到更多信息

https://github.com/kieferk/dfply

一些例子是：

from dfply import *
diamonds >> group_by('cut') >> row_slice(5)
diamonds >> distinct(X.color)
diamonds >> filter_by(X.cut == 'Ideal', X.color == 'E', X.table < 55, X.price < 500)
diamonds >> mutate(x_plus_y=X.x + X.y, y_div_z=(X.y / X.z)) >> select(columns_from('x')) >> head(3)

Answer 15

管道功能可以通过用点组成熊猫方法来实现。这是下面的示例。

加载示例数据框：

import seaborn    
iris = seaborn.load_dataset("iris")
type(iris)
# <class 'pandas.core.frame.DataFrame'>

用圆点说明熊猫方法的组成：

(iris.query("species == 'setosa'")
     .sort_values("petal_width")
     .head())

如果需要，您可以向熊猫数据框添加新方法（例如，here完成）：

pandas.DataFrame.new_method  = new_method

Answer 16

我的两分钱灵感来自http://tomerfiliba.com/blog/Infix-Operators/

class FuncPipe:
  class Arg:
    def __init__(self, arg):
      self.arg = arg
    def __or__(self, func):
      return func(self.arg)

  def __ror__(self, arg):
    return self.Arg(arg)
pipe = FuncPipe()

然后

1 |pipe| \
  (lambda x: return x+1) |pipe| \
  (lambda x: return 2*x)

返回

python中的功能管道，例如来自R＆lt; dplyr的％＆gt;％

16 个答案:

示例代码：