Question

我阅读了关于Pandas apply函数的文档，该函数声明apply函数适用于数据帧的行或列，并返回一个序列或数据帧。是否可以以返回标量的方式编写代码？或者是否有必要进一步使用.pipe函数链接它。我尝试在文档中提供的示例数据框上编写以下函数：

df = pd.DataFrame({'one' : pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
    'two' : pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
    'three' : pd.Series(np.random.randn(3), index=['b', 'c', 'd'])})

def my_func1(x):
    min_of_x = x[['one', 'two']]
    return min_of_x['one']

def my_func2(x):
    min_of_x = x[['one', 'two']]
    return min_of_x['one'].iloc[0]

def my_func3(x):
    min_of_x = x[['one', 'two']]
    return min_of_x.max()

def my_func4(x, elem_pos=0):
    return x.iloc[elem_pos]

当我跑步时：

df.apply(my_func1, axis=1)

它工作正常，并按预期给我一个系列。但是假设我想要第一个元素或者那个函数来计算导致标量的Series值：

df.apply(my_func2, axis=1)

我收到错误＆＃34;属性错误：（＆＃34;＆＃39; numpy.float64＆＃39;对象没有属性＆＃39; iloc＆＃39;＆＃34;，＆＃39;发生在索引a＆＃39;）＆＃34;。如果我使用my_func3来计算max：

df.apply(my_func3, axis=1)

它可以再次返回系列。返回标量的唯一方法似乎是使用.pipe链接另一个函数：

df.apply(my_func1, axis=1).pipe(my_func4, 2)

所以我只想得出结论apply函数是仅生成Series还是DataFrame，并且任何返回另一个值的尝试都会产生此错误。会是这种情况吗？这是为了防止我想对结果进行一些计算，这些计算不能由内置的Panda和NumPy函数完成。

Answer 1

要理解的基本要点是pd.Series对象总是传递给应用。传递的内容取决于你用它调用的轴。

例如，axis=1会传递此信息：

one      ...
three    ...
two      ...
Name: a/b/c/d, dtype: float64

而且，axis=0会传递此信息：

a    ...
b    ...
c    ...
d    ...
dtype: float64

在任何一种情况下，这都是pd.Series个对象。

在my_func1中，您对系列进行切片：x[['one', 'two']]，这也会产生一个系列对象。索引单个项目（例如x['one']）将返回 float 对象，因此浮动对象自然不会有.iloc属性与之关联。这就是my_func2抛出AttributeError。

的原因

作为练习，请尝试运行此代码：

In [891]: def my_func1(x):
     ...:     print(type(x['one']))
     ...:     min_of_x = x[['one', 'two']]
     ...:     return min_of_x['one']

In [892]: df.apply(my_func1, axis=1)

这给出了：

<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>

pandas数据框上的apply函数能产生标量吗？

1 个答案: