Question

我有一个pandas DataFrame（名为“df1”），结构如下（虽然我有很多个月的日常数据）：

           date  WeightedReturn
0    15/07/2015        0.005128
1    15/07/2015        0.002844
2    15/07/2015        0.003055
3    15/07/2015       -0.001481
4    15/07/2015       -0.000741
5    15/07/2015       -0.000741
6    16/07/2015       -0.004253
7    16/07/2015       -0.001712
8    16/07/2015       -0.001712
9    21/07/2015       -0.000178
10   21/07/2015       -0.000089
11   21/07/2015       -0.00008

我希望从中创建一个新的DataFrame，它充当数据透视表并合并日期字段并对该特定日期的加权回报求和，得到类似的结果：

  date        WeightedReturn
0 15/07/2015    0.00806425
1 16/07/2015    -0.007676
2 21/07/2015    -0.000356

我尝试过使用“groupby函数”：

df2 = df1.groupby('date').sum()

这种（有点）有效，但输出然后错误地按日期排序日期：

            WeightedReturn
date                      
01/09/2015        0.004803
02/09/2015        0.005144
03/08/2015       -0.000120
03/09/2015       -0.025164
04/08/2015        0.003956
04/09/2015        0.008942
05/08/2015       -0.01323

您可以看到的不是按时间顺序排列。

所以我尝试使用数据透视表函数，但是在阅读有关它的文档时我感到非常困惑。

我试过了：

df2 = pandas.pivot_table(df1, values="Weighted Return", index="date",aggfunc=np.sum)

我收到错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\tools\pivot.py", line 147, in pivot_table
    table = table[values[0]]
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1791, in __getitem__
    return self._getitem_column(key)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1798, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1084, in _get_item_cache
    values = self._data.get(item)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 2851, in get
    loc = self.items.get_loc(item)
  File "C:\Python27\lib\site-packages\pandas\core\index.py", line 1578, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3811)
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3691)
  File "pandas\hashtable.pyx", line 697, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12336)
  File "pandas\hashtable.pyx", line 705, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12287)
KeyError: 'Weighted Return'

请问一些好人指出我出错的地方吗？

Answer 1

如果您不希望groupby对值进行排序（默认行为），只需传入sort=False：

>>> df.groupby('date', sort=False).sum()
            WeightedReturn
date                      
15/07/2015        0.008064
16/07/2015       -0.007677
21/07/2015       -0.000347

值将按照它们首次出现在列中的顺序排序。或者，您可以将日期列转换为datetime64类型，然后像以前一样使用groupby：此时您只需要按字典顺序排序日期字符串。

您的数据透视表的错误是因为您输入了列名称为＆＃34;加权回报＆＃34; （注意空格）而不是＆＃34; WeightedReturn＆＃34;。但是，pivot_table将始终返回已排序的索引，这会返回到原始问题。

熊猫：使用groupby或数据透视表时保持日期顺序

1 个答案: