Pandas df.applymap()生成datetime64 [ns]到timedelta的不需要的类型转换

时间:2018-01-15 16:54:51

标签: python pandas datetime

在df datetime列的屏蔽子集上执行applymap时,四列中的两列将转换为timedelta。无法弄清楚可能发生的事情,可能是与https://github.com/pandas-dev/pandas/issues/18493类似的错误?但为什么只有四个中的两个?!

print time_data.dtypes, time_data[nt].dtypes    

time_data[nt] = time_data[nt].applymap(lambda x: x.strftime('%I:%M:%S %p') if pd.notnull(x) else pd.NaT)

time_data['Total Clock Time'] = time_data['Total Clock Time'].apply(lambda x: x.seconds/3600)

print time_data.dtypes, time_data[nt].dtypes

Date                        object
Name                        object
In AM               datetime64[ns]
Out AM              datetime64[ns]
In PM               datetime64[ns]
Out PM              datetime64[ns]
Sick Time           datetime64[ns]
Total Clock Time            object
dtype: object 

In AM        datetime64[ns]
Out AM       datetime64[ns]
In PM        datetime64[ns]
Out PM       datetime64[ns]
Sick Time    datetime64[ns]
dtype: object

Date                         object
Name                         object
In AM                        object
Out AM                       object
In PM               timedelta64[ns]
Out PM              timedelta64[ns]
Sick Time            datetime64[ns]
Total Clock Time            float64
dtype: object 

In AM                 object
Out AM                object
In PM        timedelta64[ns]
Out PM       timedelta64[ns]
Sick Time     datetime64[ns]
dtype: object

数据如下所示:

         Date           Name               In AM              Out AM  \
0  2017-11-06   AUSTIN LEWIS 1900-01-01 06:10:24 1900-01-01 12:03:23   
1  2017-11-06     FRED MOORE 1900-01-01 06:58:37 1900-01-01 12:12:11   
2  2017-11-06  KERRIE PAUSSA 1900-01-01 11:58:48 1900-01-01 19:39:49   
3  2017-11-06   OMAR CUELLAR                 NaT                 NaT   
4  2017-11-07   AUSTIN LEWIS 1900-01-01 07:07:27 1900-01-01 12:06:43   

            In PM              Out PM Sick Time  
0 1900-01-01 12:32:03 1900-01-01 17:31:50       NaT  
1 1900-01-01 12:42:53 1900-01-01 17:31:50       NaT  
2                 NaT                 NaT       NaT  
3 1900-01-01 20:00:19 1900-01-01 23:59:41       NaT  
4 1900-01-01 12:35:26 1900-01-01 17:33:20       NaT              

1 个答案:

答案 0 :(得分:1)

strftime默认会返回objectdtypestimedelta的其他两列是这样的,因为您说要用pd.NaT填充空白。使用np.NaN代替:

df[nt].applymap(lambda x : x.strftime('%I:%M:%S %p') if pd.notnull(x) else np.NaN)

output

df[nt].applymap(lambda x : x.strftime('%I:%M:%S %p') if pd.notnull(x) else np.NaN).dtypes

output dtypes