我想创建一个自定义排序的DataFrame。为此,我使用了pandas.Categorical()
,但是如果我在groupby NAN
中使用了这个结果,则返回值。
# import the pandas module
import pandas as pd
# Create an example dataframe
raw_data = {'Date': ['2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13','2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13', '2016-05-13'],
'Portfolio': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C'],
'Duration': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3],
'Yield': [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1],}
df = pd.DataFrame(raw_data, columns = ['Date', 'Portfolio', 'Duration', 'Yield'])
df['Portfolio'] = pd.Categorical(df['Portfolio'],['C', 'B', 'A'])
df=df.sort_values('Portfolio')
dfs = df.groupby(['Date','Portfolio'], as_index =False).sum()
print(dfs)
Date Portfolio Duration Yield
Date Portfolio
13/05/2016 C NaN NaN NaN NaN
B NaN NaN NaN NaN
A NaN NaN NaN NaN
为什么会这样,我怎么能克服这个?
同样提出SettingWithCopyWarning
是否有更好的分类成语?
答案 0 :(得分:1)
as_index=False
搞砸了什么。如果我只跑:
dfs = df.groupby(['Date','Portfolio']).sum()
我明白了:
Duration Yield
Date Portfolio
2016-05-13 C 18 6.0
B 10 10.0
A 6 1.8
我不知道为什么会这样。这可能是一个错误。
如果你真的想要没有索引的结果,只有'Date'
和'Portfolio'
作为列,那就使用'reset_index()'
。
dfs = df.groupby(['Date','Portfolio']).sum().reset_index()
Date Portfolio Duration Yield
0 2016-05-13 C 18 6.0
1 2016-05-13 B 10 10.0
2 2016-05-13 A 6 1.8