我有一个pandas数据框df
,如下所示:
_sent_time_stamp distance duration duration_in_traffic Orig_lat
0 1456732800 1670 208 343 51.441092
我想将纪元时间值(_sent_time_stamp)转换为两列,一列是日期,另一列是小时。
我定义了两个函数:
def date_convert(time):
return time.date()
def hour_convert(time):
return time.hour()
然后我使用lambda演算来应用这些函数并创建2个新列。
df['date'] = Goo_results.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)
df['hour'] = Goo_results.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)
日期栏有效但小时不起作用。我不明白为什么!
TypeError: ("'int' object is not callable", u'occurred at index 0')
答案 0 :(得分:1)
您可以删除()
下一个hour
:
def date_convert(time):
return time.date()
def hour_convert(time):
return time.hour #remove ()
df['date'] = df.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)
df['hour'] = df.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)
print df
_sent_time_stamp distance duration duration_in_traffic Orig_lat \
0 1456732800 1670 208 343 51.441092
date hour
0 2016-02-29 8
更好更快
dat = pd.to_datetime(df['_sent_time_stamp'], unit='s')
df['date'] = dat.dt.date
df['hour'] = dat.dt.hour
print df
_sent_time_stamp distance duration duration_in_traffic Orig_lat \
0 1456732800 1670 208 343 51.441092
date hour
0 2016-02-29 8
<强>计时强>:
In [20]: %timeit new(df1)
1000 loops, best of 3: 827 µs per loop
In [21]: %timeit lamb(df)
The slowest run took 4.40 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 1.13 ms per loop
代码:
df1 = df.copy()
def date_convert(time):
return time.date()
def hour_convert(time):
return time.hour
def lamb(df):
df['date'] = df.apply(lambda row: date_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)
df['hour'] = df.apply(lambda row: hour_convert(pd.to_datetime(row['_sent_time_stamp'], unit='s')), axis=1)
return df
def new(df):
dat = pd.to_datetime(df['_sent_time_stamp'], unit='s')
df['date'] = dat.dt.date
df['hour'] = dat.dt.hour
return df
print lamb(df)
print new(df1)