重新采样和True / False设置为1或0时出现问题。

时间:2016-07-04 08:53:00

标签: python pandas resampling

好的,所以我有这个代码:读取这个CSV文件:

2004-02-27 00:00,7.7,167
2004-02-27 00:05,7.5,169
2004-02-27 00:10,7.2,173
2004-02-27 00:15,7,176
2004-02-27 00:20,6.5,176
2004-02-27 00:25,6,178
2004-02-27 00:30,5.6,184
2004-02-27 00:35,5.5,185
2004-02-27 00:40,5.8,182
2004-02-27 00:45,6.3,178
2004-02-27 00:50,6.6,179
2004-02-27 00:55,6.6,186
2004-02-27 01:00,6.6,189
2004-02-27 01:05,6.5,190
2004-02-27 01:10,6,192
2004-02-27 01:15,6.2,195
2004-02-27 01:20,6.4,197
2004-02-27 01:25,6.7,197
2004-02-27 01:30,7,198
2004-02-27 01:35,7.2,200
2004-02-27 01:40,7.5,202
2004-02-27 01:45,7.8,206
2004-02-27 01:50,8.1,206
2004-02-27 01:55,7.9,202
2004-02-27 02:00,7.5,196
2004-02-27 02:05,7.3,191
2004-02-27 02:10,7.7,189
2004-02-27 02:15,7.7,185
2004-02-27 02:20,7.4,173
2004-02-27 02:25,6.8,172
2004-02-27 02:30,6,183
2004-02-27 02:35,5.8,188
2004-02-27 02:40,5.8,186
2004-02-27 02:45,6,177
2004-02-27 02:50,6,174
2004-02-27 02:55,5.8,179
2004-02-27 03:00,5.4,183
2004-02-27 03:05,5.6,177
2004-02-27 03:10,5.8,174
2004-02-27 03:15,5.7,176
2004-02-27 03:20,5.7,176
2004-02-27 03:25,5.4,177
2004-02-27 03:30,4.9,174
2004-02-27 03:35,4.6,174
2004-02-27 03:40,4.4,175
2004-02-27 03:45,4.2,175
2004-02-27 03:50,3.9,170
2004-02-27 03:55,4,161
2004-02-27 04:00,4,157
2004-02-27 04:05,3.8,158
2004-02-27 04:10,3.7,158
2004-02-27 04:15,3.4,155
2004-02-27 04:20,3.5,153
2004-02-27 04:25,3.9,154
2004-02-27 04:30,4.2,152
2004-02-27 04:35,4.3,146
2004-02-27 04:40,4.1,139
2004-02-27 04:45,3.8,134
2004-02-27 04:50,3.9,132
2004-02-27 04:55,4.7,130
2004-02-27 05:00,5.6,125
2004-02-27 05:05,5.6,121
2004-02-27 05:10,5,119
2004-02-27 05:15,4.2,120
2004-02-27 05:20,3.8,120
2004-02-27 05:25,4.3,116
2004-02-27 05:30,5.1,112
2004-02-27 05:35,5.7,110
2004-02-27 05:40,6.3,106
2004-02-27 05:45,6.2,101
2004-02-27 05:50,5.7,95
2004-02-27 05:55,5.4,91
2004-02-27 06:00,4.5,90
2004-02-27 06:05,3.7,86
2004-02-27 06:10,3.2,83
2004-02-27 06:15,2.7,83
2004-02-27 06:20,2.5,78
2004-02-27 06:25,3,69
2004-02-27 06:30,4,69
2004-02-27 06:35,4.5,74
2004-02-27 06:40,4.1,70
2004-02-27 06:45,3.9,64
2004-02-27 06:50,3.9,62
2004-02-27 06:55,3.3,63
2004-02-27 07:00,3,72
2004-02-27 07:05,3.5,94
2004-02-27 07:10,3.7,110
2004-02-27 07:15,2.6,115
2004-02-27 07:20,1.7,118
2004-02-27 07:25,1.7,111
2004-02-27 07:30,1.5,104
2004-02-27 07:35,2.4,94
2004-02-27 07:40,3.8,87
2004-02-27 07:45,4.3,85
2004-02-27 07:50,4.4,89
2004-02-27 07:55,4.5,91
2004-02-27 08:00,5.2,91
2004-02-27 08:05,5.1,90
2004-02-27 08:10,4.3,79
2004-02-27 08:15,3.9,69
2004-02-27 08:20,4.1,58
2004-02-27 08:25,4.5,46
2004-02-27 08:30,5.1,31
2004-02-27 08:35,5.3,17
2004-02-27 08:40,4.7,8
2004-02-27 08:45,4.1,6
2004-02-27 08:50,3.1,11
2004-02-27 08:55,1.8,2
2004-02-27 09:00,1.2,349
2004-02-27 09:05,1.1,344
2004-02-27 09:10,1.4,350
2004-02-27 09:15,1.6,345
2004-02-27 09:20,1.7,322
2004-02-27 09:25,2.1,304
2004-02-27 09:30,2.4,296
2004-02-27 09:35,2.1,297
2004-02-27 09:40,1.8,309
2004-02-27 09:45,2,323
2004-02-27 09:50,2.6,326
2004-02-27 09:55,3,330
2004-02-27 10:00,2.8,342
2004-02-27 10:05,2.9,351
2004-02-27 10:10,3.5,348
2004-02-27 10:15,4.1,342
2004-02-27 10:20,4.7,333
2004-02-27 10:25,4.9,319
2004-02-27 10:30,5.1,309
2004-02-27 10:35,6,301
2004-02-27 10:40,6.6,299
2004-02-27 10:45,7.5,296
2004-02-27 10:50,6.9,288
2004-02-27 10:55,4.3,227
2004-02-27 11:00,2.5,184
2004-02-27 11:05,2.1,201
2004-02-27 11:10,2.4,222
2004-02-27 11:15,2.9,228
2004-02-27 11:20,2.9,226
2004-02-27 11:25,2.8,191
2004-02-27 11:30,2.8,168
2004-02-27 11:35,2.6,185
2004-02-27 11:40,3,203
2004-02-27 11:45,3.5,200
2004-02-27 11:50,3.9,174
2004-02-27 11:55,3.5,158
2004-02-27 12:00,2.4,187
2004-02-27 12:05,2.2,218
2004-02-27 12:10,1.7,218
2004-02-27 12:15,1.1,215
2004-02-27 12:20,1.1,204
2004-02-27 12:25,1.8,206
2004-02-27 12:30,2.4,226
2004-02-27 12:35,3.3,233
2004-02-27 12:40,3.9,226
2004-02-27 12:45,3.7,219
2004-02-27 12:50,3.9,214
2004-02-27 12:55,4.7,213
2004-02-27 13:00,5.1,214
2004-02-27 13:05,5.1,216
2004-02-27 13:10,5.3,217
2004-02-27 13:15,5.7,219
2004-02-27 13:20,6.1,225
2004-02-27 13:25,6.3,228
2004-02-27 13:30,6.3,225
2004-02-27 13:35,6.3,223
2004-02-27 13:40,6.2,224
2004-02-27 13:45,5.8,226
2004-02-27 13:50,5.9,231
2004-02-27 13:55,6.9,237
2004-02-27 14:00,7.7,241
2004-02-27 14:05,7.7,244
2004-02-27 14:10,7.8,244
2004-02-27 14:15,8.3,247
2004-02-27 14:20,8.6,248
2004-02-27 14:25,8.6,249
2004-02-27 14:30,9,251
2004-02-27 14:35,9.3,251
2004-02-27 14:40,9.1,250
2004-02-27 14:45,8.9,249
2004-02-27 14:50,8.7,246
2004-02-27 14:55,8.7,242
2004-02-27 15:00,8.7,243
2004-02-27 15:05,8.6,246
2004-02-27 15:10,8.9,249
2004-02-27 15:15,9.3,251
2004-02-27 15:20,9,253
2004-02-27 15:25,8.6,255
2004-02-27 15:30,8.3,257
2004-02-27 15:35,7.5,260
2004-02-27 15:40,6.9,267
2004-02-27 15:45,7.1,271
2004-02-27 15:50,7.2,270
2004-02-27 15:55,7.3,270
2004-02-27 16:00,7.7,271
2004-02-27 16:05,7.6,269
2004-02-27 16:10,7.2,265
2004-02-27 16:15,7,264
2004-02-27 16:20,7.1,266
2004-02-27 16:25,7.3,269
2004-02-27 16:30,7.6,272
2004-02-27 16:35,7.5,273
2004-02-27 16:40,7.4,273
2004-02-27 16:45,7.2,274
2004-02-27 16:50,6.6,273
2004-02-27 16:55,6,274
2004-02-27 17:00,5.6,275
2004-02-27 17:05,5.4,274
2004-02-27 17:10,5.5,269
2004-02-27 17:15,5.7,267
2004-02-27 17:20,5.6,265
2004-02-27 17:25,5.3,263
2004-02-27 17:30,5.4,261
2004-02-27 17:35,5.6,256
2004-02-27 17:40,5.4,255
2004-02-27 17:45,5,259
2004-02-27 17:50,5,264
2004-02-27 17:55,5.3,267
2004-02-27 18:00,6,266
2004-02-27 18:05,6.6,258
2004-02-27 18:10,6.6,249
2004-02-27 18:15,6.7,246
2004-02-27 18:20,6.6,244
2004-02-27 18:25,6.8,248
2004-02-27 18:30,6.5,254
2004-02-27 18:35,5.6,262
2004-02-27 18:40,6.7,267
2004-02-27 18:45,7.7,268
2004-02-27 18:50,6.5,269
2004-02-27 18:55,5.9,274
2004-02-27 19:00,6.6,270
2004-02-27 19:05,6.5,256
2004-02-27 19:10,5.6,252
2004-02-27 19:15,5.4,259
2004-02-27 19:20,4.8,255
2004-02-27 19:25,3.8,244
2004-02-27 19:30,4.1,242
2004-02-27 19:35,4.7,239
2004-02-27 19:40,5,236
2004-02-27 19:45,5.4,235
2004-02-27 19:50,6,237
2004-02-27 19:55,6.2,239
2004-02-27 20:00,6.5,240
2004-02-27 20:05,7.5,248
2004-02-27 20:10,7.8,246
2004-02-27 20:15,7.2,239
2004-02-27 20:20,7.5,244
2004-02-27 20:25,8.9,247
2004-02-27 20:30,10.5,248
2004-02-27 20:35,12.3,252
2004-02-27 20:40,12.8,252
2004-02-27 20:45,12.4,247
2004-02-27 20:50,12.6,247
2004-02-27 20:55,12.4,250
2004-02-27 21:00,11.9,252
2004-02-27 21:05,11.8,252
2004-02-27 21:10,11.7,252
2004-02-27 21:15,11.4,252
2004-02-27 21:20,10.8,252
2004-02-27 21:25,10.6,251
2004-02-27 21:30,10.6,251
2004-02-27 21:35,10.9,251
2004-02-27 21:40,11.6,250
2004-02-27 21:45,11.9,251
2004-02-27 21:50,11.4,254
2004-02-27 21:55,11.1,255
2004-02-27 22:00,11.5,256
2004-02-27 22:05,11.9,256
2004-02-27 22:10,11.9,255
2004-02-27 22:15,12,255
2004-02-27 22:20,12.2,257
2004-02-27 22:25,12.4,258
2004-02-27 22:30,12.5,257
2004-02-27 22:35,12.6,257
2004-02-27 22:40,12.7,258
2004-02-27 22:45,12.9,259
2004-02-27 22:50,13.6,259
2004-02-27 22:55,14.3,259
2004-02-27 23:00,14.2,262
2004-02-27 23:05,13.8,263
2004-02-27 23:10,14.1,263
2004-02-27 23:15,14.8,264
2004-02-27 23:20,14.7,263
2004-02-27 23:25,13.6,263
2004-02-27 23:30,12.9,263
2004-02-27 23:35,13.2,261
2004-02-27 23:40,12.4,261
2004-02-27 23:45,12.2,262
2004-02-27 23:50,12.9,261
2004-02-27 23:55,11.6,260
2004-02-28 00:00,10.8,259
2004-02-28 00:05,10.9,260
2004-02-28 00:10,10.6,262
2004-02-28 00:15,10.1,264
2004-02-28 00:20,9.3,264

我想要它做的是,对于每个数据点,给它一个风向(第一列)的标志(1或0),如果风速小于4或大于4。一个也是。

然后我想要它取平均值,如果有一个假数据点(平均值小于1),则可以将其转换为0.

但是,当逐行浏览时,它会保存为整数,但随后会转换为对象。

我的代码如下:

import pandas as pd
names=['Date','Wind Speed','Wind Direction']
df_met = pd.read_csv('Met_Test.csv', index_col=0, names=names, parse_dates=[0])
df1=df_met
df1.insert(2,'Wind_direction_Flag','1')
df1.insert(3,'Wind_Speed_Less_than_4','0')
df1.insert(4,'Middle','0')
df1.insert(5,'Wind_Speed_Greater_than_10','0')

    for line in df1.iterrows():
            #print("**********")
            flag1=(line[1]['Wind Direction']>250)  & (line[1]['Wind Direction'] <345)
            #print (flag1)
            #flag1=flag1.astype(int)
            flag2=(line[1]['Wind Speed']<4)
            #flag2=flag2.astype(int)
            flag3=(line[1]['Wind Speed']>=4 ) & (line[1]['Wind Speed']<=10)
            #flag3=flag3.astype(int)
            flag4=(line[1]['Wind Speed']>10)
            #flag4=flag4.astype(int)
            print("JDKSJKDj",line[1]['Middle'])
    #%%

            line[1]['Wind_direction_Flag']=int(flag1)
            line[1]['Wind_Speed_Less_than_4']=int(flag2)
            line[1]['Middle']=int(flag3)
            line[1]['Wind_Speed_Greater_than_10']=int(flag4)


    print("JDKSJKDj",df1['Middle'])


    print(type(df1['Middle'])) 


    print(type(df1['Middle']))    
    df1.to_csv("All_Met.csv")
    ####Takes an hour average       
    df2=df1.resample('h').mean()
    df2.loc[df2['Wind_direction_Flag'] == 0, 'Wind_direction_Flag'] = -100
    df2.loc[df2['Wind_direction_Flag'] > 0, 'Wind_direction_Flag'] = 0 
    df2.loc[df2['Wind_direction_Flag'] < 0, 'Wind_direction_Flag'] = 1

提前致谢!

1 个答案:

答案 0 :(得分:0)

我认为这似乎有效:

import pandas as pd
names=['Date','Wind Speed','Wind Direction']
df_met = pd.read_csv('Met_Test.csv', index_col=0, names=names, parse_dates=[0])
df1=df_met
df1.insert(2,'Wind_direction_Flag','1')
df1.insert(3,'Wind_Speed_Less_than_4','0')
df1.insert(4,'Middle','0')
df1.insert(5,'Wind_Speed_Greater_than_10','0')

flag1=(df1['Wind Direction']>290 ) & (df1['Wind Direction'] <345)
flag1=flag1.astype(int)

flag2=(df1['Wind Speed']<4)
flag2=flag2.astype(int)

flag3=(df1['Wind Speed']>=4 ) & (df1['Wind Speed']<=10)
flag3=flag3.astype(int)

flag4=(df1['Wind Speed']>20)
flag4=flag4.astype(int)

df1['Wind_direction_Flag']=flag1
df1['Wind_Speed_Less_than_4']=flag2
df1['Middle']=flag3
df1['Wind_Speed_Greater_than_10']=flag4
#%%        
df1.to_csv("All_Met.csv") 
####Takes an hour average       
df2=df1.resample('h').mean()
df2.loc[df2['Wind_direction_Flag'] == 0, 'Wind_direction_Flag'] = -100
df2.loc[df2['Wind_direction_Flag'] > 0, 'Wind_direction_Flag'] = 0 #Win ddirection is the opposite flag as want 1's for good data
df2.loc[df2['Wind_direction_Flag'] < 0, 'Wind_direction_Flag'] = 1