使用Pandas pd.pivot_table按日期进行转换

时间:2015-10-16 18:38:01

标签: python datetime pandas pivot-table

我对熊猫和蟒蛇还是很陌生,我担心我在这里做些蠢事。也就是说,我遇到的问题最接近我遇到的问题是How to create pivot with totals (margins) in Pandas?,所以我问。

我有一个包含3列的简单数据框。

  Account ID Amount Close Date
0         10a    100 2009-01-01
1         10a     50 2009-01-01
2         10a    100 2010-04-01
3         10a    100 2011-04-01
4         10a    100 2012-05-01
..        ...    ...        ...
35         4b     .5 2009-01-01
36         4c     .5 2009-01-01
37         5a     .5 2009-01-01
38         5b     .5 2009-01-01
39         8a     .5 2009-01-01

我认为我在关闭日期栏时遇到了问题。我怀疑大熊猫不知道2009-01-01等于另一个2009-01-01。

我想透过这个表来获取这样的输出,在那里我可以看到事先按帐户ID分组,然后是关闭日期。如果一个帐户ID有多个具有相同关闭日期的行,我希望这些金额在值列中添加,就像这样。 (为了记录,我真的只对这一年感兴趣,但在拍摄问题时我一直在尽量简化。)

Account ID Close Date 
2c          2009-01-01  100
            2011-01-01  100
10a         2009-01-01  150
            2010-04-01  100
...

我已经尝试了各种各样的事情,并且继续遇到问题,这些问题让我有了一些日期问题。也许我需要导入一个不同的库?

这是我最近的尝试:

pd.pivot_table(opps, index=['Account ID'], columns = 'Close Date', values=['Amount'], aggfunc=np.su米)

并且输出非常接近我想要的。

唯一的问题是,对于任何有两行日期的帐户ID,该数据只会在输出中消失。对于2009-01-01,帐户10a有3行,但在数据透视表中显示2009-01-01 Nan。

我以为我会尝试使用margin = True的相同数据透视表。

当我这样做时,我收到了一条错误消息。

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-182-f8dc0d75c868> in <module>()
      3                margins = "True",
      4                values=['Amount'],
----> 5                aggfunc=np.sum)

/Applications/anaconda/lib/python2.7/site-packages/pandas/tools/pivot.pyc in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna)
    141     if margins:
    142         table = _add_margins(table, data, values, rows=index,
--> 143                              cols=columns, aggfunc=aggfunc)
    144 
    145     # discard the top level

/Applications/anaconda/lib/python2.7/site-packages/pandas/tools/pivot.pyc in _add_margins(table, data, values, rows, cols, aggfunc)
    167 
    168     if values:
--> 169         marginal_result_set = _generate_marginal_results(table, data, values, rows, cols, aggfunc, grand_margin)
    170         if not isinstance(marginal_result_set, tuple):
    171             return marginal_result_set

/Applications/anaconda/lib/python2.7/site-packages/pandas/tools/pivot.pyc in _generate_marginal_results(table, data, values, rows, cols, aggfunc, grand_margin)
    236                 # we are going to mutate this, so need to copy!
    237                 piece = piece.copy()
--> 238                 piece[all_key] = margin[key]
    239 
    240                 table_pieces.append(piece)

/Applications/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1795             return self._getitem_multilevel(key)
   1796         else:
-> 1797             return self._getitem_column(key)
   1798 
   1799     def _getitem_column(self, key):

/Applications/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   1802         # get column
   1803         if self.columns.is_unique:
-> 1804             return self._get_item_cache(key)
   1805 
   1806         # duplicate columns & possible reduce dimensionaility

/Applications/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1082         res = cache.get(item)
   1083         if res is None:
-> 1084             values = self._data.get(item)
   1085             res = self._box_item_values(item, values)
   1086             cache[item] = res

/Applications/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   2849 
   2850             if not isnull(item):
-> 2851                 loc = self.items.get_loc(item)
   2852             else:
   2853                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Applications/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_loc(self, key, method)
   1570         """
   1571         if method is None:
-> 1572             return self._engine.get_loc(_values_from_object(key))
   1573 
   1574         indexer = self.get_indexer([key], method=method)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)()

KeyError: Timestamp('2009-01-01 00:00:00')

感谢您提供任何建议。

1 个答案:

答案 0 :(得分:0)

听起来像是一个小组,而不是一个数据透视表给我 - 你的列是固定的。

例如:

import pandas as pd
from datetime import date

df = pd.DataFrame(data=[['10a', 100, date(2009, 1, 1)],
                        ['10a', 50, date(2009, 1, 1)],
                        ['10a', 100, date(2010, 4, 1)],
                        ['10a', 100, date(2011, 4, 1)],
                        ['10a', 100, date(2012, 5, 1)],
                        ['4b', .5, date(2009, 1, 1)],
                        ['4c', .5, date(2009, 1, 1)],
                        ['5a', .5, date(2009, 1, 1)],
                        ['5b', .5, date(2009, 1, 1)],
                        ['8a', .5, date(2009, 1, 1)]],
                  columns=['Account ID', 'Amount', 'Close Date'])

df.groupby(['Account ID', 'Close Date']).sum()

给出:

                       Amount
Account ID Close Date        
10a        2009-01-01   150.0
           2010-04-01   100.0
           2011-04-01   100.0
           2012-05-01   100.0
4b         2009-01-01     0.5
4c         2009-01-01     0.5
5a         2009-01-01     0.5
5b         2009-01-01     0.5
8a         2009-01-01     0.5

如果我错过了什么,请道歉。

与数据透视表的等价物是:

df.pivot_table(index=['Account ID', 'Close Date'], values=['Amount'], aggfunc=np.sum)