Question

我想我用 pandas.Series.agg 方法发现了一个异常。

这是我发现的。

>>> v = pd.Series([172, 172, 170.0, 170., 168.])
>>> 
>>> v.agg(np.mean)
170.4
>>> 
>>> v.agg(lambda x: np.mean(x))
0    172.0
1    172.0
2    170.0
3    170.0
4    168.0
dtype: float64
>>> 
>>> np.mean(v)
170.4

我觉得这很令人沮丧，因为 lambda x: f(x) 应该与 f(x) 的工作原理相同，对吗？ .agg(func) 的输入是系列（根据文档），但输出显示不是。这是另一个输出。

>>> v.agg(lambda x: print(type(x)))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
0    None
1    None
0    None
1    None
0    None
dtype: object

>>> v.agg(lambda x: print(x.tolist()))
[172.0, 172.0, 170.0, 170.0, 168.0]

什么？输出表明 lambda x: 的输入对于上述两种情况是不同的。即使可能，我也不确定。

这是我从进一步调查中得到的信息。

>>> v.agg(lambda x: np.mean(x))
0    172.0
1    172.0
0    170.0
1    170.0
0    168.0
dtype: float64
>>> v.groupby(level=0).agg(lambda x: np.mean(x))
0    170.0
1    171.0
dtype: float64
>>> v.agg(lambda x: np.mean(x.tolist()))
170.4

至少，lambda x: np.mean(x) 对分组系列按预期工作！但谜团仍然存在。谁能帮我澄清一下这里发生了什么？

我使用带有 lambda x: np.mean(x) 的 pandas.DataFrame.agg 进行测试，它按预期工作！

>>> pd.DataFrame(v)
       0
0  172.0
1  172.0
0  170.0
1  170.0
0  168.0
>>> pd.DataFrame(v).agg(lambda x: np.mean(x))
0    170.4
dtype: float64

====

总而言之，我的问题是下面两个的结果是smae。

v.groupby(by = [0]*len(v)).agg(np.mean)
v.groupby(by = [0]*len(v)).agg(lambda x: np.mean(x))

但是下面两个不一样。在pandas中，这不是被认为是不一致吗？

v.agg(np.mean)
v.agg(lambda x: np.mean(x))

它来自哪里？

Answer 1

>>> pd.DataFrame(v).agg(lambda x: np.mean(x))
0    170.4

以上对您来说看起来不错，因为它应用于 axis=0，但是如果您通过 axis=1，您将获得与 Series 相同的结果：

>>> pd.DataFrame(v).agg(lambda x: np.mean(x), axis=1)
0    172.0
1    172.0
2    170.0
3    170.0
4    168.0
dtype: float64

问题是，DataFrames 有两个轴，即 0 和 1，但 Series 只有一个轴，即 0。当您执行 v.agg(lambda x: np.mean(x)) 时，它被分别应用于每个单独的值的系列，这类似于 pandas.Series.apply ，而当您执行 v.agg(np.mean) 时，它被应用于整个系列。

看着docs for pandas.Series.agg：

<块引用>

参数: func: function, str, list or dict
要使用的函数聚合数据。如果是函数，则必须在传递时工作系列或传递给 Series.apply 时。

Answer 2

查看 { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "ec2:DescribeInstances", "ec2:DescribeSecurityGroupRules", "ec2:DescribeInstanceAttribute", "ec2:DescribeNetworkAcls", "ec2:DescribeSecurityGroups", "ec2:DescribeInstanceStatus" ], "Resource": "*" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "ec2:RevokeSecurityGroupIngress", "ec2:AuthorizeSecurityGroupEgress", "ec2:AuthorizeSecurityGroupIngress", "ec2:UpdateSecurityGroupRuleDescriptionsEgress", "ec2:RevokeSecurityGroupEgress", "ec2:UpdateSecurityGroupRuleDescriptionsIngress", "ec2:ModifySecurityGroupRules" ], "Resource": [ "arn:aws:ec2:*:*:security-group/<sg-id>", "arn:aws:ec2:*:*:security-group-rule/*" ] } ] } 的来源。该方法检查传递的函数是否被向量化，然后返回一个标量，如果不是，则将其应用于每一行

<块引用>

agg

这就是组合转换和聚合函数时出现错误的原因：

        # try a regular apply, this evaluates lambdas
        # row-by-row; however if the lambda is expected a Series
        # expression, e.g.: lambda x: x-x.quantile(0.25)
        # this will fail, so we can try a vectorized evaluation

        # we cannot FIRST try the vectorized evaluation, because
        # then .agg and .apply would have different semantics if the
        # operation is actually defined on the Series, e.g. str

与自定义函数聚合的熊猫不一致？

2 个答案: