我正在使用pd.read_csv读取3个不同的数据集。数据的一列是以秒为单位的时间,我想使用为pd.read_csv date_parser参数创建的函数。当所有数据都是整数时,它可以正常工作。但是,当我有字符串或浮点数时,我做的函数不起作用。我认为问题出在我的函数的datetime.datetime.fromtimestamp(float(time_in_secs)部分。是否有人知道如何使它适用于我的所有数据集。我完全被困住了。这三个不同的数据集看起来一样。
数据集1
555、1404803485、800
555、1408906759、900
数据集2
231,1404803485,通过
231,1404803490,失败
数据集3
16010925、1403890894、40.5819880696
16010925、1903929273、40.5819880696
def dateparse(time_in_secs):
if isinstance(time_in_secs, str):
if time_in_secs == '\\N':
time_in_secs = 0
tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm
pd.read_csv('dataset_here.csv',
delimiter=',', index_col=[0,1], parse_dates=['Timestamp'],
date_parser=dateparse, names=['Serial', 'Timestamp', 'result'])
答案 0 :(得分:2)
我相信需要将所有字符串的时间都转换为0
,因为float
的解决方案效果很好:
def dateparse(time_in_secs):
if isinstance(time_in_secs, str):
#https://stackoverflow.com/a/45372194
#time_in_secs = 86400
time_in_secs = 0
#print (time_in_secs)
tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm
更多一般解决方案-尝试将值转换为浮点数,如果不可能,请分配默认值:
def dateparse(time_in_secs):
if isinstance(time_in_secs, str):
try:
time_in_secs = float(time_in_secs)
except ValueError:
#https://stackoverflow.com/a/45372194
#time_in_secs = 86400
time_in_secs = 0
#print (time_in_secs)
tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm
示例:在Windows下测试:
import pandas as pd
import datetime
def dateparse(time_in_secs):
if isinstance(time_in_secs, str):
try:
time_in_secs = float(time_in_secs)
except ValueError:
#https://stackoverflow.com/a/45372194
#time_in_secs = 0
time_in_secs = 86400
print (time_in_secs)
tm = datetime.datetime.fromtimestamp(float(time_in_secs))
tm = tm - datetime.timedelta(
minutes=tm.minute % 10, seconds=tm.second, microseconds=tm.microsecond)
return tm
temp=u"""16010925,test,40.5819880696
16010925,1903929273,40.5819880696"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), index_col=[0,1], parse_dates=['Timestamp'],
date_parser=dateparse, names=['Serial', 'Timestamp', 'result'])
print (df)
result
Serial Timestamp
16010925 1970-01-02 01:00:00 40.581988
2030-05-02 07:10:00 40.581988
print (df.index.get_level_values(1))
DatetimeIndex(['1970-01-02 01:00:00', '2030-05-02 07:10:00'],
dtype='datetime64[ns]', name='Timestamp', freq=None)