concat将值转换为nan数据

时间:2016-06-29 17:01:11

标签: python pandas

我有这段代码:

gg=df_met[['Less','Middle','Greater']].resample('h').mean()
Filtered_mean=Filtered[['Conc']].resample('h').mean()

result = pd.concat([Filtered_mean, gg], axis=1, join_axes=[df1.index])
Reduced_result=result.dropna(axis=0,how='any')

gg是一个文件:

                         Less    Middle   Greater
Date                                             


2004-02-27 00:00:00  0.000000  1.000000  0.000000
2004-02-27 01:00:00  0.000000  1.000000  0.000000
2004-02-27 02:00:00  0.000000  1.000000  0.000000
2004-02-27 03:00:00  0.083333  0.916667  0.000000
2004-02-27 04:00:00  0.583333  0.416667  0.000000
2004-02-27 05:00:00  0.083333  0.916667  0.000000
2004-02-27 06:00:00  0.666667  0.333333  0.000000
2004-02-27 07:00:00  0.750000  0.250000  0.000000
2004-02-27 08:00:00  0.250000  0.750000  0.000000
2004-02-27 09:00:00  1.000000  0.000000  0.000000
2004-02-27 10:00:00  0.250000  0.750000  0.000000
2004-02-27 11:00:00  1.000000  0.000000  0.000000
2004-02-27 12:00:00  0.916667  0.083333  0.000000
2004-02-27 13:00:00  0.000000  1.000000  0.000000
2004-02-27 14:00:00  0.000000  1.000000  0.000000
2004-02-27 15:00:00  0.000000  1.000000  0.000000
2004-02-27 16:00:00  0.000000  1.000000  0.000000
2004-02-27 17:00:00  0.000000  1.000000  0.000000
2004-02-27 18:00:00  0.000000  1.000000  0.000000
2004-02-27 19:00:00  0.083333  0.916667  0.000000
2004-02-27 20:00:00  0.000000  0.500000  0.500000
2004-02-27 21:00:00  0.000000  0.000000  1.000000
2004-02-27 22:00:00  0.000000  0.000000  1.000000
2004-02-27 23:00:00  0.000000  0.000000  1.000000
2004-02-28 00:00:00  0.000000  0.666667  0.333333
2004-02-28 01:00:00  0.000000  0.833333  0.166667
2004-02-28 02:00:00  0.000000  0.166667  0.833333
2004-02-28 03:00:00  0.000000  0.000000  1.000000
2004-02-28 04:00:00  0.000000  0.000000  1.000000
2004-02-28 05:00:00  0.000000  0.000000  1.000000

Filtered_mean是:

                       Conc
2004-02-27 15:00  30.166667
2004-02-27 16:00  24.218182
2004-02-27 17:00  44.781818
2004-02-27 18:00  15.200000
2004-02-27 19:00  33.490000
2004-02-27 20:00  17.100000
2004-02-27 21:00  15.470000
2004-02-27 22:00  13.100000
2004-02-27 23:00  17.736364
2004-02-28 00:00  19.225000
2004-02-28 01:00   9.760000
2004-02-28 02:00   2.737500
2004-02-28 03:00   4.175000
2004-02-28 04:00   2.990000
2004-02-28 05:00   4.983333
2004-02-28 06:00   3.370000
2004-02-28 07:00   2.983333
2004-02-28 08:00   3.508333
2004-02-28 09:00   2.641667
2004-02-28 10:00   4.916667
2004-02-28 11:00   7.100000
2004-02-28 12:00  11.609091
2004-02-28 13:00   5.540000
2004-02-28 14:00   3.025000
2004-02-28 15:00   5.127273
2004-02-28 16:00  11.660000
2004-02-28 17:00   5.833333
2004-02-28 18:00   8.183333
2004-02-28 19:00  -0.158333
2004-02-28 20:00   6.575000

当我将它们连接起来时

                      Conc  Less  Middle  Greater
Date                                              
2004-02-27 15:00  30.166667   NaN     NaN      NaN
2004-02-27 15:00  30.166667   NaN     NaN      NaN
2004-02-27 15:00  30.166667   NaN     NaN      NaN
2004-02-27 16:00  24.218182   NaN     NaN      NaN

这是因为索引列是一个整数

dtype='int64', length=34342, freq='H')

和“gg”是日期时间。

dtype='datetime64[ns]', name='Date', length=42479, freq='H')

如果是这样,如何将整个帧转换为另一个?

完整代码:

import pandas as pd
import datetime as dt
import io 
import numpy as np
names=['Date','Wind Speed','Wind Direction']
df2 = pd.read_csv('Met_12_13.csv', index_col=0, names=names, parse_dates=[0])
df_met=df2
df_met.insert(2,'Less','Nan')
df_met.insert(3,'Middle','Nan')
df_met.insert(4,'Greater','Nan')
for line in df2:
    flag1=(df2['Wind Speed']<4)
    flag1=flag1.astype(int)
    flag2=(df2['Wind Speed']>=4 ) & (df2['Wind Speed']<=10)
    flag2=flag2.astype(int)
    flag3=(df2['Wind Speed']>10)
    flag3=flag3.astype(int)

    df_met['Less']=flag1
    df_met['Middle']=flag2
    df_met['Greater']=flag3



aethalometer=['Date','Chanel0','Chanel1','Chanel2','Chanel3','Chanel4','Chanel5','Chanel6','Chanel7']
#df1=pd.read_csv('result.txt', index_col=0,sep='\n', names=aethalometer, parse_dates=[0])
df1 = pd.read_csv('Ath_12_13.csv', sep=',', names=aethalometer ) #Spirows=1
df1['Date'] = pd.to_datetime(df1['Date'], errors='coerce')
for y in range (0,6):
    x=y+1
    df1[aethalometer[x]]= pd.to_numeric(df1[aethalometer[x]], errors='coerce')
    df1=df1[df1[aethalometer[x]]>-250]
    df1=df1[df1[aethalometer[x]]<500]
    df1['Date'] = pd.to_datetime(df1['Date'], errors='coerce')
    df1.index



print(len(df1))
#df1 = pd.read_csv(io.StringIO('Output14.csv'), parse_dates=[0], names=['Date','A','B','C','D','E','F','G', 'H'])
#df_mean = df1[['Conc']].resample('h').mean()
print("here")

#df1.index = df1.index.to_period('h')
df_met['per'] = df_met.index.to_period('h')

#df_mean.index=df_mean.index.to_period('h')
#print(len(df_mean)) 
pers = df_met.loc[(df2['Wind Direction'] > 340) | (df_met['Wind Direction'] < 12) , 'per'].unique()

print (pers)
print("here")
#%%
Filtered=df1.drop(pers)
#del Filtered['Date']

a=Filtered['Chanel1']
a.index = pd.to_datetime(a.index, errors='coerce')

b=Filtered['Chanel2']
b.index = pd.to_datetime(b.index, errors='coerce')

c=Filtered['Chanel3']
c.index = pd.to_datetime(c.index, errors='coerce')

d=Filtered['Chanel4']
d.index = pd.to_datetime(d.index, errors='coerce')

e=Filtered['Chanel5']
e.index = pd.to_datetime(e.index, errors='coerce')

f=Filtered['Chanel0']
f.index = pd.to_datetime(f.index, errors='coerce')

g=Filtered['Chanel7']
g.index = pd.to_datetime(g.index, errors='coerce')

a=a.resample('h').mean()
a_median=a.resample('h').median()  #This is how you would make it median
b=b.resample('h').mean()
c=c.resample('h').mean()
d=d.resample('h').mean()
e=e.resample('h').mean()
f=f.resample('h').mean()
g = pd.to_numeric(g, errors='coerce')
g=g.resample('h').mean()

Series=pd.concat([a,b,c,d,e,f,g],join='outer',axis=1)

gg=df_met[['Less','Middle','Greater']].resample('h').mean()

result_mean = pd.concat([Series, gg], axis=1, join_axes=[gg.index])
Reduced_result_mean=result_mean.dropna(axis=0,how='any')

Reduced_result_mean.to_csv("Final2012-13.csv")

1 个答案:

答案 0 :(得分:2)

确实是的。您应该在两个数据帧中具有一致的索引类型。

使用

filtered_mean.reset_index(inplace=True)
filtered_mean['date']=pd.to_datetime(filtered_mean['date'])
filtered_mean.set_index('date',inplace=True)

现在filtered_meangg都应该有日期时间索引。