pandas groupby意外地返回负数

时间:2017-08-18 17:27:37

标签: pandas group-by

我每年都有关于港口货运量的数据,我希望总结之后变成百分比,但我发现我意外得到了负数:

data = pd.Series(
{('2006', 'Oakland, CA (Port)'): 7460155164,
 ('2006', 'Rest of California'): 32868692124,
 ('2006', 'San Francisco, CA (Port)'): 2262901767,
 ('2007', 'Oakland, CA (Port)'): 7881218797,
 ('2007', 'Rest of California'): 38595482723,
 ('2007', 'San Francisco, CA (Port)'): 1897361592,
 ('2008', 'Oakland, CA (Port)'): 8325019179,
 ('2008', 'Rest of California'): 46200094019,
 ('2008', 'San Francisco, CA (Port)'): 2732413994,
 ('2009', 'Oakland, CA (Port)'): 9077952296,
 ('2009', 'Rest of California'): 42642020668,
 ('2009', 'San Francisco, CA (Port)'): 2998130982,
 ('2010', 'Oakland, CA (Port)'): 9596205900,
 ('2010', 'Rest of California'): 48091887406,
 ('2010', 'San Francisco, CA (Port)'): 2623519555,
 ('2011', 'Oakland, CA (Port)'): 10316313358,
 ('2011', 'Rest of California'): 54869935898,
 ('2011', 'San Francisco, CA (Port)'): 2591413704})
data

这个系列表现如预期:

data.sum(level=0)
Out[27]:
2006    42591749055
2007    48374063112
2008    57257527192
2009    54718103946
2010    60311612861
2011    67777662960
dtype: int64

或使用`groupby:

data.groupby(level=0).sum()


Out[26]:
2006    42591749055
2007    48374063112
2008    57257527192
2009    54718103946
2010    60311612861
2011    67777662960
dtype: int64

我想申请这个: lambda x:x / x.sum()执行组内百分比,但x.sum()给出了意想不到的结果:当我在lambda函数中求和时,得到值:

data.groupby(level=0).apply(lambda x: x.sum())

Out[28]:
2006    -357923905
2007    1129422856
2008    1422952344
2009   -1116470902
2010     182070717
2011    -941813776
dtype: int64

对于记录来说,分组本身看起来很健康,返回预期的值集:

data.groupby(level=0).apply(lambda x: [x])
Out[21]:
2006     [[7460155164, 32868692124, 2262901767]]
2007     [[7881218797, 38595482723, 1897361592]]
2008     [[8325019179, 46200094019, 2732413994]]
2009     [[9077952296, 42642020668, 2998130982]]
2010     [[9596205900, 48091887406, 2623519555]]
2011    [[10316313358, 54869935898, 2591413704]]
dtype: object

但为什么是负数呢?

0 个答案:

没有答案