在Jupyter中尝试熊猫时,我注意到了非常奇怪的症状。我将其缩减为显示症状的最少代码:
import pandas as pd
import numpy as np
from datetime import datetime
df = pd.DataFrame({
'A': ['a', 'b', 'c'],
'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]
})
df
A B
0 a 2018-11-01
1 b 2018-11-02
2 c 2018-11-03
def process(gdf):
return pd.Series({
'C': datetime(2018, 11, 5)
})
df2 = df.groupby(['A']).apply(process).reset_index()
df2
A C
0 a 1541376000000000000
1 b 1541376000000000000
2 c 1541376000000000000
df2['C']
0 1541376000000000000
1 1541376000000000000
2 1541376000000000000
Name: C, dtype: int64
如您所见,C
列最终为int64
类型,而不是预期的datetime64[ns]
类型。但是,如果我没有B
列,那么C
列将正确地以datetime64[ns]
结尾。
df = pd.DataFrame({
'A': ['a', 'b', 'c'],
# 'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]
})
df
A
0 a
1 b
2 c
def process(gdf):
return pd.Series({
'C': datetime(2018, 11, 5)
})
df2 = df.groupby(['A']).apply(process).reset_index()
df2
A C
0 a 2018-11-05
1 b 2018-11-05
2 c 2018-11-05
df2['C']
0 2018-11-05
1 2018-11-05
2 2018-11-05
Name: C, dtype: datetime64[ns]
我不知道发生了什么事。有人知道吗?我正在使用Python 3.6和Pandas 0.23.1
答案 0 :(得分:0)
首先,它似乎是错误。
我认为这里可以为每个组创建新列,而不返回Series,而是返回gdp
组:
def process(gdf):
gdf['C'] = datetime(2018, 11, 5)
return gdf
df2 = df.groupby(['A']).apply(process)
print (df2)
A B C
0 a 2018-11-01 2018-11-05
1 b 2018-11-02 2018-11-05
2 c 2018-11-03 2018-11-05