我每年都有关于港口货运量的数据,我希望总结之后变成百分比,但我发现我意外得到了负数:
data = pd.Series(
{('2006', 'Oakland, CA (Port)'): 7460155164,
('2006', 'Rest of California'): 32868692124,
('2006', 'San Francisco, CA (Port)'): 2262901767,
('2007', 'Oakland, CA (Port)'): 7881218797,
('2007', 'Rest of California'): 38595482723,
('2007', 'San Francisco, CA (Port)'): 1897361592,
('2008', 'Oakland, CA (Port)'): 8325019179,
('2008', 'Rest of California'): 46200094019,
('2008', 'San Francisco, CA (Port)'): 2732413994,
('2009', 'Oakland, CA (Port)'): 9077952296,
('2009', 'Rest of California'): 42642020668,
('2009', 'San Francisco, CA (Port)'): 2998130982,
('2010', 'Oakland, CA (Port)'): 9596205900,
('2010', 'Rest of California'): 48091887406,
('2010', 'San Francisco, CA (Port)'): 2623519555,
('2011', 'Oakland, CA (Port)'): 10316313358,
('2011', 'Rest of California'): 54869935898,
('2011', 'San Francisco, CA (Port)'): 2591413704})
data
这个系列表现如预期:
data.sum(level=0)
Out[27]:
2006 42591749055
2007 48374063112
2008 57257527192
2009 54718103946
2010 60311612861
2011 67777662960
dtype: int64
或使用`groupby:
data.groupby(level=0).sum()
Out[26]:
2006 42591749055
2007 48374063112
2008 57257527192
2009 54718103946
2010 60311612861
2011 67777662960
dtype: int64
我想申请这个:
lambda x:x / x.sum()执行组内百分比,但x.sum()
给出了意想不到的结果:当我在lambda函数中求和时,得到负值:
data.groupby(level=0).apply(lambda x: x.sum())
Out[28]:
2006 -357923905
2007 1129422856
2008 1422952344
2009 -1116470902
2010 182070717
2011 -941813776
dtype: int64
对于记录来说,分组本身看起来很健康,返回预期的值集:
data.groupby(level=0).apply(lambda x: [x])
Out[21]:
2006 [[7460155164, 32868692124, 2262901767]]
2007 [[7881218797, 38595482723, 1897361592]]
2008 [[8325019179, 46200094019, 2732413994]]
2009 [[9077952296, 42642020668, 2998130982]]
2010 [[9596205900, 48091887406, 2623519555]]
2011 [[10316313358, 54869935898, 2591413704]]
dtype: object
但为什么是负数呢?