使用Pandas / python选择基于多个条件的数据

时间:2016-01-21 10:23:21

标签: python python-3.x pandas

我有两个数据帧,第一个有一个方向列,如下所示:

In [9]:wtg_data[1][['U_all', 'Dir_all']].head()
Out[9]: 
                        U_all   Dir_all
timestamp                             
2015-05-09 00:00:00  6.425383  192.7583
2015-05-09 00:10:00  6.736392  196.0836
2015-05-09 00:20:00  7.613443  203.2848
2015-05-09 00:30:00  7.539424  203.4758
2015-05-09 00:40:00  7.365549  205.2733

第二个部分有一组我需要排除的开始和结束角度,如下所示:

In [16]:wake_exclusion_zone[1][['end_angle', 'start_angle']]
Out[16]: 
     end_angle  start_angle
0          NaN          NaN
1    92.766080    37.683639
2     4.587928   296.557159
3    58.302667     6.732354
4   354.386611   305.505815
5    35.865741   324.134259
6   353.667108   313.202790
7    24.513812   335.486188
8   356.721479   321.058398
9    18.416798   341.583202
10  358.340554   325.613169
11   14.495289   342.304661

我想从第一个数据帧中选择(并保存在不同的df中)方向(Dir_all列)在第二个df中由对定义的任何扇区之间的数据。

目前,我尝试了以下适用于第一部分的内容

export = wtg_data[1][(wtg_data[1]['Dir_all'] > wake_exclusion_zone[1]['start_angle'][1]) & (wtg_data[1]['Dir_all'] < wake_exclusion_zone[1]['end_angle'][1])]

但是当我尝试遍历第二个df并附加数据export时,df保持不变。

1 个答案:

答案 0 :(得分:1)

您可以尝试merge第一个数据帧df1的所有行以及第二个数据帧df2的所有行,然后过滤输出:

输出没有数据,因此我更改了df1的第一行和第二行。

print df1
                          U_all   Dir_all
timestamp                                
2015-05-09 00:00:00  200.000000   92.7583
2015-05-09 00:00:00  200.000000   92.7583
2015-05-09 00:10:00    6.736392  196.0836
2015-05-09 00:20:00    7.613443  203.2848
2015-05-09 00:30:00    7.539424  203.4758
2015-05-09 00:40:00    7.365549  205.2733

print df2
     end_angle  start_angle
0          NaN          NaN
1    92.766080    37.683639
2     4.587928   296.557159
3    58.302667     6.732354
4   354.386611   305.505815
5    35.865741   324.134259
6   353.667108   313.202790
7    24.513812   335.486188
8   356.721479   321.058398
9    18.416798   341.583202
10  358.340554   325.613169
11   14.495289   342.304661
#helper column for join
df1['i'] = 1
df2['i'] = 1

df1 = df1.reset_index()

df = pd.merge(df1, df2, on = ['i'])
#condition
df = df[(df.Dir_all > df.start_angle) & (df.Dir_all <= df.end_angle)]
#remove helper
df = df.drop(['i'], axis=1)
print df
    timestamp  U_all  Dir_all  end_angle  start_angle
1  2015-05-09    200  92.7583   92.76608    37.683639
13 2015-05-09    200  92.7583   92.76608    37.683639

#set index from column timestamp
df = df.set_index('timestamp')
#drop duplicates
df = df.drop_duplicates(['U_all','Dir_all','end_angle', 'start_angle'])
print df
            U_all  Dir_all  end_angle  start_angle
timestamp                                         
2015-05-09    200  92.7583   92.76608    37.683639