pandas如何截断分钟,秒pandas.tslib.Timestamp

时间:2016-07-08 16:13:10

标签: python pandas timestamp truncate seconds

我使用Cloudera VM 5.2和pandas 0.18.0。

我有以下数据

adclicksDF = pd.read_csv('/home/cloudera/Eglence/ad-clicks.csv',
               parse_dates=['timestamp'],
       skipinitialspace=True).assign(adCount=1)

adclicksDF.head(n=5)
Out[107]: 
            timestamp  txId  userSessionId  teamId  userId  adId   adCategory  \
0 2016-05-26 15:13:22  5974           5809      27     611     2  electronics   
1 2016-05-26 15:17:24  5976           5705      18    1874    21       movies   
2 2016-05-26 15:22:52  5978           5791      53    2139    25    computers   
3 2016-05-26 15:22:57  5973           5756      63     212    10      fashion   
4 2016-05-26 15:22:58  5980           5920       9    1027    20     clothing   



   adCount  
0        1  
1        1  
2        1  
3        1  
4        1  

数据类型字段

for col in adclicksDF:
    print(col)
    print(type(adclicksDF[col][1]))


timestamp
<class 'pandas.tslib.Timestamp'>
txId
<class 'numpy.int64'>
userSessionId
<class 'numpy.int64'>
teamId
<class 'numpy.int64'>
userId
<class 'numpy.int64'>
adId
<class 'numpy.int64'>
adCategory
<class 'str'>
adCount
<class 'numpy.int64'>

我想在时间戳中截断分钟和秒。

我试过

adclicksDF["timestamp"] = pd.to_datetime(adclicksDF["timestamp"],format='%Y-%m-%d %H')

adclicksDF.head(n=5)
Out[110]: 
            timestamp  txId  userSessionId  teamId  userId  adId   adCategory  \
0 2016-05-26 15:13:22  5974           5809      27     611     2  electronics   
1 2016-05-26 15:17:24  5976           5705      18    1874    21       movies   
2 2016-05-26 15:22:52  5978           5791      53    2139    25    computers   
3 2016-05-26 15:22:57  5973           5756      63     212    10      fashion   
4 2016-05-26 15:22:58  5980           5920       9    1027    20     clothing   

   adCount  
0        1  
1        1  
2        1  
3        1  
4        1  

这不会截断分钟和秒数。

如何截断分钟和秒?

3 个答案:

答案 0 :(得分:3)

您可以使用:

adclicksDF["timestamp"] = pd.to_datetime(adclicksDF["timestamp"])
                            .apply(lambda x: x.replace(minute=0, second=0))


print (adclicksDF)
            timestamp  txId  userSessionId  teamId  userId  adId   adCategory
0 2016-05-26 15:00:00  5974           5809      27     611     2  electronics
1 2016-05-26 15:00:00  5976           5705      18    1874    21       movies
2 2016-05-26 15:00:00  5978           5791      53    2139    25    computers
3 2016-05-26 15:00:00  5973           5756      63     212    10      fashion
4 2016-05-26 15:00:00  5980           5920       9    1027    20     clothing

print (type(adclicksDF.ix[0, 'timestamp']))
<class 'pandas.tslib.Timestamp'>

如果需要输出string,请使用dt.strftime

adclicksDF["timestamp"] = pd.to_datetime(adclicksDF["timestamp"]).dt.strftime('%Y-%m-%d %H')
print (adclicksDF)
       timestamp  txId  userSessionId  teamId  userId  adId   adCategory
0  2016-05-26 15  5974           5809      27     611     2  electronics
1  2016-05-26 15  5976           5705      18    1874    21       movies
2  2016-05-26 15  5978           5791      53    2139    25    computers
3  2016-05-26 15  5973           5756      63     212    10      fashion
4  2016-05-26 15  5980           5920       9    1027    20     clothing

print (type(adclicksDF.ix[0, 'timestamp']))
<class 'str'>

编辑:

更好的解决方案是使用dt.floor,就像回答Alex

一样

答案 1 :(得分:2)

pd.Timestamp具有floor分辨率方法,因为0.18

adclicksDF["timestamp"] = adclicksDF.timestamp.dt.floor('h')

答案 2 :(得分:0)

尝试:

pd.to_datetime(adclicksDF.timestamp).dt.strftime('%Y-%m-%d %H')

转让后:

adclicksDF.timestamp = pd.to_datetime(adclicksDF.timestamp).dt.strftime('%Y-%m-%d %H')

adclicksDF

enter image description here