如何分别填写NaT和NaN值

时间:2019-02-17 23:59:44

标签: python pandas datetime nan

我的数据框同时包含NaT和NaN值

    Date/Time_entry      Entry      Date/Time_exit       Exit   
0   2015-11-11 10:52:00  19.9900    2015-11-11 11:30:00  20.350 
1   2015-11-11 11:36:00  20.4300    2015-11-11 11:38:00  20.565 
2   2015-11-11 11:44:00  21.0000    NaT                  NaN        
3   2009-04-20 10:28:00  13.7788    2009-04-20 10:46:00  13.700

我想用日期填充NaT并用数字填充NaN。 Fillna(4)方法用4代替NaT和NaN。是否可以通过某种方式区分NaT和NaN?

我当前的解决方法是df [column] .fillna()

2 个答案:

答案 0 :(得分:1)

由于NaT与日期时间列有关,因此您可以在应用填充操作时排除它们。

u = df.select_dtypes(exclude=['datetime'])
df[u.columns] = u.fillna(4)
df

      Date/Time_entry    Entry      Date/Time_exit    Exit
0 2015-11-11 10:52:00  19.9900 2015-11-11 11:30:00  20.350
1 2015-11-11 11:36:00  20.4300 2015-11-11 11:38:00  20.565
2 2015-11-11 11:44:00  21.0000                 NaT   4.000
3 2009-04-20 10:28:00  13.7788 2009-04-20 10:46:00  13.700

类似地,要仅填充NaT值,请在上面的代码中将“排除”更改为“包含”。

u = df.select_dtypes(include=['datetime'])
df[u.columns] = u.fillna(pd.to_datetime('today'))
df

      Date/Time_entry    Entry             Date/Time_exit    Exit
0 2015-11-11 10:52:00  19.9900 2015-11-11 11:30:00.000000  20.350
1 2015-11-11 11:36:00  20.4300 2015-11-11 11:38:00.000000  20.565
2 2015-11-11 11:44:00  21.0000 2019-02-17 16:11:09.407466   4.000
3 2009-04-20 10:28:00  13.7788 2009-04-20 10:46:00.000000  13.700

答案 1 :(得分:1)

使用pandas.DataFrame.select_dtypes尝试类似的操作:

>>> import pandas as pd, datetime, numpy as np
>>> df = pd.DataFrame({'a': [datetime.datetime.now(), np.nan], 'b': [5, np.nan], 'c': [1, 2]})
>>> df
                           a    b  c
0 2019-02-17 18:06:15.231557  5.0  1
1                        NaT  NaN  2
>>> fill_dt = datetime.datetime.now()
>>> fill_value = 4
>>> dt_filled_df = df.select_dtypes('datetime').fillna(fill_dt)
>>> dt_filled_df
                           a
0 2019-02-17 18:06:15.231557
1 2019-02-17 18:06:36.040404
>>> value_filled_df = df.select_dtypes('int').fillna(fill_value)
>>> value_filled_df
   c
0  1
1  2
>>> dt_filled_df.columns = [col + '_notnull' for col in dt_filled_df]
>>> value_filled_df.columns = [col + '_notnull' for col in value_filled_df]
>>> df = df.join(value_filled_df)
>>> df = df.join(dt_filled_df)
>>> df
                           a    b  c  c_notnull                  a_notnull
0 2019-02-17 18:06:15.231557  5.0  1          1 2019-02-17 18:06:15.231557
1                        NaT  NaN  2          2 2019-02-17 18:06:36.040404