Pandas groupby将日期时间应用于异常

时间:2018-11-20 06:40:16

标签: python pandas datetime group-by

在Jupyter中尝试熊猫时,我注意到了非常奇怪的症状。我将其缩减为显示症状的最少代码:

import pandas as pd
import numpy as np
from datetime import datetime

df = pd.DataFrame({
    'A': ['a', 'b', 'c'],
    'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]
})
df

    A   B
0   a   2018-11-01
1   b   2018-11-02
2   c   2018-11-03

def process(gdf):
    return pd.Series({
        'C': datetime(2018, 11, 5)
    })
df2 = df.groupby(['A']).apply(process).reset_index()
df2

    A   C
0   a   1541376000000000000
1   b   1541376000000000000
2   c   1541376000000000000

df2['C']

0    1541376000000000000
1    1541376000000000000
2    1541376000000000000
Name: C, dtype: int64

如您所见,C列最终为int64类型,而不是预期的datetime64[ns]类型。但是,如果我没有B列,那么C列将正确地以datetime64[ns]结尾。

df = pd.DataFrame({
    'A': ['a', 'b', 'c'],
    # 'B': [datetime(2018, 11, 1), datetime(2018, 11, 2), datetime(2018, 11, 3) ]
})
df

    A
0   a
1   b
2   c

def process(gdf):
    return pd.Series({
        'C': datetime(2018, 11, 5)
    })
df2 = df.groupby(['A']).apply(process).reset_index()
df2

    A   C
0   a   2018-11-05
1   b   2018-11-05
2   c   2018-11-05

df2['C']

0   2018-11-05
1   2018-11-05
2   2018-11-05
Name: C, dtype: datetime64[ns]

我不知道发生了什么事。有人知道吗?我正在使用Python 3.6和Pandas 0.23.1

1 个答案:

答案 0 :(得分:0)

首先,它似乎是错误。

我认为这里可以为每个组创建新列,而不返回Series,而是返回gdp组:

def process(gdf):
    gdf['C'] = datetime(2018, 11, 5)
    return gdf

df2 = df.groupby(['A']).apply(process)
print (df2)
   A          B          C
0  a 2018-11-01 2018-11-05
1  b 2018-11-02 2018-11-05
2  c 2018-11-03 2018-11-05