无法将“ Timestamp”类型与“ str” Pandas Python类型进行比较

时间:2020-04-30 17:22:53

标签: python pandas dataframe datetime compare

我有两个带有日期时间的数据框:

df["datetime"] = df[["date","time"]].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
df["datetime"] = pd.to_datetime(df["datetime"], format='%Y-%m-%d  %H:%M:%S')

,另一个:

df_labels.columns = ["start_date","start_time","end_date","end_time","mode"]
df_labels["start_datetime"] = df_labels[["start_date","start_time"]].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
df_labels["end_datetime"] = df_labels[["end_date","end_time"]].apply(lambda row: ' '.join(row.values.astype(str)), axis=1)
df_labels["start_datetime"] = df_labels["start_datetime"].str.replace("/","-")
df_labels["end_datetime"] = df_labels["end_datetime"].str.replace("/","-")
df_labels["start_datetime"] = pd.to_datetime(df_labels["start_datetime"], format='%Y-%m-%d  %H:%M:%S')
df_labels["end_datetime"] = pd.to_datetime(df_labels["end_datetime"], format='%Y-%m-%d  %H:%M:%S')

以上所有代码均成功运行。

df示例:

    lat         long        u1  alt d               date        time       datetime              mode
0   39.921712   116.472343  0   13  39298.146204    2007-08-04  03:30:32    2007-08-04 03:30:32 
1   39.921705   116.472343  0   13  39298.146215    2007-08-04  03:30:33    2007-08-04 03:30:33 
2   39.921695   116.472345  0   13  39298.146227    2007-08-04  03:30:34    2007-08-04 03:30:34 
3   39.921683   116.472342  0   13  39298.146238    2007-08-04  03:30:35    2007-08-04 03:30:35 
4   39.921672   116.472342  0   13  39298.146250    2007-08-04  03:30:36    2007-08-04 03:30:36 

df_labels示例:

    start_date  start_time  end_date    end_time    mode    start_datetime  end_datetime
0   2007/06/26  11:32:29    2007/06/26  11:40:29    bus 2007-06-26 11:32:29 2007-06-26 11:40:29
1   2008/03/28  14:52:54    2008/03/28  15:59:59    train   2008-03-28 14:52:54 2008-03-28 15:59:59
2   2008/03/28  16:00:00    2008/03/28  22:02:00    train   2008-03-28 16:00:00 2008-03-28 22:02:00
3   2008/03/29  01:27:50    2008/03/29  15:59:59    train   2008-03-29 01:27:50 2008-03-29 15:59:59
4   2008/03/29  16:00:00    2008/03/30  15:59:59    train   2008-03-29 16:00:00 2008-03-30 15:59:59

但是,当我运行此命令时:

for index, row in df_labels.iterrows():
    df.loc[(df["datetime"] >= row["start_datetime"]) & (df["datetime"] < row["end_datetime"])] = row["mode"]

我收到以下错误:

TypeError: Cannot compare type 'Timestamp' with type 'str'

请告知

1 个答案:

答案 0 :(得分:0)

考虑:日期时间值采用这种dd/mm/yy hh:mm:ss格式。

df['datetime'] = pd.to_datetime(df['datetime'], format='%d/%m/%y %H:%M:%S')
df_labels["start_datetime"] = pd.to_datetime(df_labels["start_datetime"], format='%d/%m/%y %H:%M:%S')
df_labels["end_datetime"] = pd.to_datetime(df_labels["end_datetime"], format='%%d/%m/%y %H:%M:%S')

确保数据类型:

df.dtypes
df_label.dtypes
正确转换后,

datetime列应显示datetime64[ns]

附加(效率):

import numpy as np
import pandas as pd
import pandasql as ps
from pandas import Timestamp
from pandasql import sqldf
import sqlite3

conn = sqlite3.connect(':memory:')
##### write the tables
df.to_sql('df', conn, index=False)
df_label.to_sql('df', conn, index=False)

qry = '''
    select *
    from df
    inner join 
    (select mode df_label_mode, start_date, end_date from df_label) df_label
    on (df.datetime between df_label.start_date and df_label.end_date)
    '''
df_x = pd.read_sql_query(qry, conn)
df_x.head()

参考:Converting date column

相关问题