我是Pandas的新手,我有一个这种形式的数据框:
date category value
0 2017-11-30 13:58:57 A 901
1 2017-11-30 13:59:41 B 905
2 2017-11-30 13:59:41 C 925
第一列是日期,第二列是已知的三个类别的分类。
它是由:
生成的import pandas as pd
df = pd.DataFrame.from_items( [('date', ['2017-11-30 13:58:57', '2017-11-30 13:59:41', '2017-11-30 13:59:41']),('category',['A','B', 'C']),("value", [901, 905, 925])])
df['date'] = pd.to_datetime(df['date'])
df['category'] = df['category'].astype('category')
问题是,对于每个日期,并非所有类别都存在。我希望添加缺少值的缺失类别:
date category value
0 2017-11-30 13:58:57 A 901
1 2017-11-30 13:58:57 B nan
2 2017-11-30 13:58:57 C nan
3 2017-11-30 13:59:41 A nan
4 2017-11-30 13:59:41 B 905
5 2017-11-30 13:59:41 C 925
有没有内置的方法可以不重复行?
答案 0 :(得分:0)
您可以reindex
使用MultiIndex.from_product
:
df = df.set_index(['date','category'])
cats = pd.MultiIndex.from_product(df.index.levels, names=df.index.names)
df = df.reindex(cats).reset_index()
print (df)
date category value
0 2017-11-30 13:58:57 A 901.0
1 2017-11-30 13:58:57 B NaN
2 2017-11-30 13:58:57 C NaN
3 2017-11-30 13:59:41 A NaN
4 2017-11-30 13:59:41 B 905.0
5 2017-11-30 13:59:41 C 925.0
df = (df.set_index(['date','category'])['value']
.unstack()
.stack(dropna=False)
.reset_index(name='value'))
print (df)
date category value
0 2017-11-30 13:58:57 A 901.0
1 2017-11-30 13:58:57 B NaN
2 2017-11-30 13:58:57 C NaN
3 2017-11-30 13:59:41 A NaN
4 2017-11-30 13:59:41 B 905.0
5 2017-11-30 13:59:41 C 925.0