由于车站电力不足,我使用气象数据,我没有时间表,我需要用nan
创建这些时间表。我可以正常创造时间(频率为10 Hz
的时间)。但是当电台恢复工作时,我用来制作新数据帧的日期的舍入不一样,然后创建一个关闭时间与nan和一个存在的能量在站中返回。我可以创建数据框,但是当两者连同pandas连接时,它会创建数据框,其中包含我创建的日期和它们拥有的日期,所有这些都归结于四舍五入。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from os import listdir
from os.path import isfile, join
import datetime
def dateparse(a,b):
data = str(a)+' '+str(b)
return pd.datetime.strptime(data, '%Y-%m-%d %H:%M:%S:%f')
df = pd.read_csv('./CSV_PP_110_2016_010_0000.dat',sep=',',header=None,names=None,index_col=0,na_values=["-999.99"],usecols=[0,1,2,3,4,5,6,7,8,9,10,11,12,13],parse_dates=[[0,1]], date_parser=dateparse,dtype ={3: np.float32,4: np.float32,5: np.float32,6: np.float32,7: np.float32,8: np.float32,9: np.float32,10: np.float32,11: np.float32,12: np.float32,13: np.float32})
df.columns = ['u', 'v', 'w', 'Ts','CO2', 'H2O','Pressao','DiagCsat','CH4','T','sinal_CH4', 'Diag_ch4']
df['cod'] = '110'
df['cod_99'] = '-999.99'
df['ano'] = df.index.strftime('%Y')
df['dj'] = df.index.strftime('%j')
df['hr'] = df.index.strftime('%H%M')
df['seg_fre'] = df.index.strftime('%S.%f')
ano_i = df.index.strftime('%Y')[0]
ano_f = df.index.strftime('%Y')[-1]
dia_i = df.index.strftime('%d')[0]
dia_f = df.index.strftime('%d')[-1]
mes_i = df.index.strftime('%m')[0]
mes_f = df.index.strftime('%m')[-1]
df.seg_fre = (round(df.seg_fre.astype(float),1))
df.u = (round((df.u*13.1072/6).astype(float),5))
df.v = (round((df.v*13.1072/6).astype(float),5))
df.w = (round((df.w*1.6384).astype(float),5))
df.Ts = (round((df.Ts-10).astype(float),5))
df_index_i = df.index[0].strftime('%Y-%m-%d %H:%M:%S.%f')
df_index_f = df.index[-1].strftime('%Y-%m-%d %H:%M:%S.%f')
compare_i = ''+ str(ano_i)+'-'+ str(mes_i)+'-'+str(dia_i)+' ''23:59:59.906000'
compare_f = ''+ str(ano_f)+'-'+ str(mes_f)+'-'+str(dia_f)+' ''23:59:59.806000'
compare_ii = ''+ str(ano_i)+'-'+ str(mes_i)+'-'+str(dia_i)+' ''23:59:59.913000'
compare_ff = ''+ str(ano_f)+'-'+ str(mes_f)+'-'+str(dia_f)+' ''23:59:59.813000'
if df.shape[0]==864000:
df.to_csv('./CSV_110_'+df.ano[3]+'_'+df.dj[3]+'_0000.csv',sep=",",header=False,columns=['cod','ano','dj','hr','seg_fre','u', 'v', 'w', 'Ts','CO2', 'H2O', 'DiagCsat', 'CH4', 'sinal_CH4', 'Diag_ch4', 'T','Pressao'],index=False,na_rep='-999.99')
else:
if df_index_i == compare_i:
start_date = pd.to_datetime(compare_i)
end_date = pd.to_datetime(compare_f)
d=pd.DataFrame(index=pd.date_range(star=start_date, end=end_date, periods=864000, freq='0.1S'))
result=df.join(d, how='outer')
result.to_csv('/home/lucas/Teste_padronizar/teste_1_mes/saida/CSV_110_'+df.ano[3]+'_'+df.dj[3]+'_0000.csv',sep=",",header=False,columns=['cod','ano','dj','hr','seg_fre','u', 'v', 'w', 'Ts','CO2', 'H2O', 'DiagCsat', 'CH4', 'sinal_CH4', 'Diag_ch4', 'T','Pressao'],index=False,na_rep='-999.99')
else:
print('erro index',f)
The index of my `df` load is:
In [19]: df.index[0:5]
Out[19]:
DatetimeIndex(['2016-03-08 23:59:59.956000', '2016-03-09 00:00:00.056000',
'2016-03-09 00:00:00.156000', '2016-03-09 00:00:00.256000',
'2016-03-09 00:00:00.356000'],
dtype='datetime64[ns]', name='0_1', freq=None)
But when the station goes back to work the date stays:
In [17]: df.index[860000]
Out[18]: Timestamp('2016-03-09 23:55:41.006000')
And the result when I join is:
In[27]: result.index[800000:800005]
Out[27]:
DatetimeIndex(['2016-03-09 12:03:44.006000', '2016-03-09 12:03:44.106000',
'2016-03-09 12:03:44.206000', '2016-03-09 12:03:44.306000',
'2016-03-09 12:03:44.406000'],
dtype='datetime64[ns]', freq=None)
I think there may be another function different from the pandas join, but I did not find anything.
答案 0 :(得分:0)
使用
解决df.index = df.index.round('0.1S')