Pandas: Fixing datetime.time and datetime.datetime mix

时间:2019-04-17 01:22:15

标签: python pandas

I have the following DataFrame, with 'Time' column with mixed datetime types:

time_series_slice = tmp_df['XXX']
time_series_slice['Time types'] = time_series_slice['Time'].apply(lambda row: type(row))
time_series_slice['Time types'].value_counts()

<class 'datetime.datetime'>    97367
<class 'datetime.time'>           25
Name: Time types, dtype: int64

I am having a problem converting this whole 'Time' column to Pandas datetime with pd.to_datetime() method due to:

TypeError: <class 'datetime.time'> is not convertible to datetime

Approach time_series_slice['Time'].apply(lambda x: pd.Timestamp(x)) also does not work:

TypeError: Cannot convert input [00:00:00] of type <class 'datetime.time'> to Timestamp

I figures that these 25 stupid rows with are giving me this headache, but I lack ideas on what to do with them.

Firstly, how do I force Pandas to display only these rows? time_series_slice[isinstance(time_series_slice['Time'], datetime.time)] gives me:

NameError: name 'datetime' is not defined

Secondly, how do I just convert all these values to Pandas datetime and move on? :(

UPDATE:

Adding sample data view:

0    2017-02-08 22:19:08.618000
1    2017-02-08 22:19:12.187000
2    2017-02-08 22:19:13.481000
3    2017-02-08 22:19:16.330000
4    2017-02-08 22:19:16.582000
Name: Time, dtype: object

UPDATE 2: Thanks to Wen-Ben's suggestion, I have filtered out the datetime.time rows, and they look as such:

time_series_slice['Time types'] = time_series_slice['Time'].apply(lambda row: type(row).__name__)
time_series_slice[time_series_slice['Time types'] == 'time']['Time']

96367    00:00:00
96368    00:00:00
96464    00:00:00
96465    00:00:00
96466    00:00:00
96467    00:00:00
96593    00:00:00
96862    00:00:00
Name: Time, dtype: object

Would the easiest way be to re-write them to a datetime.datetime object with all 0s?

1 个答案:

答案 0 :(得分:1)

If you want to slice the those 5 rows

time_series_slice['Time types'] = time_series_slice['Time'].apply(lambda x : type(x).__name__)=='Timestamp'

time_series_slice['Time types'].value_counts()

time_series_slice[time_series_slice['Time types']=='datetime.time']

Then

We using to_datetime to convert

time_series_slice['Time']=pd.to_datetime(time_series_slice['Time'].astype(str))