为什么pandas不允许在groupby中使用分类列?

时间:2016-05-17 14:40:07

标签: python pandas

我想创建一个自定义排序的DataFrame。为此,我使用了pandas.Categorical(),但是如果我在groupby NAN中使用了这个结果,则返回值。

# import the pandas module
import pandas as pd

# Create an example dataframe
raw_data = {'Date': ['2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13','2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13'],
        'Portfolio': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C'],
        'Duration': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3],
        'Yield': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1],}

df = pd.DataFrame(raw_data, columns = ['Date', 'Portfolio', 'Duration', 'Yield'])

df['Portfolio'] = pd.Categorical(df['Portfolio'],['C', 'B', 'A'])
df=df.sort_values('Portfolio')

dfs = df.groupby(['Date','Portfolio'], as_index =False).sum()

print(dfs)

                        Date    Portfolio   Duration   Yield
Date        Portfolio               
13/05/2016  C           NaN     NaN         NaN        NaN
            B           NaN     NaN         NaN        NaN
            A           NaN     NaN         NaN        NaN

为什么会这样,我怎么能克服这个?

同样提出SettingWithCopyWarning是否有更好的分类成语?

1 个答案:

答案 0 :(得分:1)

as_index=False搞砸了什么。如果我只跑:

dfs = df.groupby(['Date','Portfolio']).sum()

我明白了:

                      Duration  Yield
Date       Portfolio                 
2016-05-13 C                18    6.0
           B                10   10.0
           A                 6    1.8

我不知道为什么会这样。这可能是一个错误。

如果你真的想要没有索引的结果,只有'Date''Portfolio'作为列,那就使用'reset_index()'

dfs = df.groupby(['Date','Portfolio']).sum().reset_index()

         Date Portfolio  Duration  Yield
0  2016-05-13         C        18    6.0
1  2016-05-13         B        10   10.0
2  2016-05-13         A         6    1.8