带有条件语句的仅连续两行的平均值

时间:2018-09-13 10:54:33

标签: python pandas dataframe

搜索了类似的问题后,我发现了thisthis个问题。不幸的是,他们俩都不与我合作。

第一个在所有列上都有效,第二个在我的TrueFalse上不起作用,并返回错误(我也没有完全理解)。

这是问题的描述:

我正在使用约54k行的数据框。这是24个值的示例:

+----+---------------------+---------------------+----------------------+--------------------+-------+
|    |        date         |       omegasr       |        omega         |      omegass       | isday |
+----+---------------------+---------------------+----------------------+--------------------+-------+
|  1 | 2012-03-27 00:00:00 | -1.5707963267948966 |    -3.32335035194977 | 1.5707963267948966 | False |
|  2 | 2012-03-27 01:00:00 | -1.5707963267948966 |  -3.0615509641506207 | 1.5707963267948966 | False |
|  3 | 2012-03-27 02:00:00 | -1.5707963267948966 |   -2.799751576351471 | 1.5707963267948966 | False |
|  4 | 2012-03-27 03:00:00 | -1.5707963267948966 |  -2.5379521885523215 | 1.5707963267948966 | False |
|  5 | 2012-03-27 04:00:00 | -1.5707963267948966 |  -2.2761528007531724 | 1.5707963267948966 | False |
|  6 | 2012-03-27 05:00:00 | -1.5707963267948966 |   -2.014353412954023 | 1.5707963267948966 | False |
|  7 | 2012-03-27 06:00:00 | -1.5707963267948966 |  -1.7525540251548732 | 1.5707963267948966 | False |
|  8 | 2012-03-27 07:00:00 | -1.5707963267948966 |  -1.4907546373557239 | 1.5707963267948966 | True  |
|  9 | 2012-03-27 08:00:00 | -1.5707963267948966 |  -1.2289552495565745 | 1.5707963267948966 | True  |
| 10 | 2012-03-27 09:00:00 | -1.5707963267948966 |  -0.9671558617574253 | 1.5707963267948966 | True  |
| 11 | 2012-03-27 10:00:00 | -1.5707963267948966 |  -0.7053564739582756 | 1.5707963267948966 | True  |
| 12 | 2012-03-27 11:00:00 | -1.5707963267948966 | -0.44355708615912615 | 1.5707963267948966 | True  |
| 13 | 2012-03-27 12:00:00 | -1.5707963267948966 |  -0.1817576983599767 | 1.5707963267948966 | True  |
| 14 | 2012-03-27 13:00:00 | -1.5707963267948966 |  0.08004168943917273 | 1.5707963267948966 | True  |
| 15 | 2012-03-27 14:00:00 | -1.5707963267948966 |  0.34184107723832213 | 1.5707963267948966 | True  |
| 16 | 2012-03-27 15:00:00 | -1.5707963267948966 |   0.6036404650374716 | 1.5707963267948966 | True  |
| 17 | 2012-03-27 16:00:00 | -1.5707963267948966 |   0.8654398528366211 | 1.5707963267948966 | True  |
| 18 | 2012-03-27 17:00:00 | -1.5707963267948966 |    1.127239240635771 | 1.5707963267948966 | True  |
| 19 | 2012-03-27 18:00:00 | -1.5707963267948966 |   1.3890386284349199 | 1.5707963267948966 | True  |
| 20 | 2012-03-27 19:00:00 | -1.5707963267948966 |   1.6508380162340692 | 1.5707963267948966 | False |
| 21 | 2012-03-27 20:00:00 | -1.5707963267948966 |   1.9126374040332188 | 1.5707963267948966 | False |
| 22 | 2012-03-27 21:00:00 | -1.5707963267948966 |    2.174436791832368 | 1.5707963267948966 | False |
| 23 | 2012-03-27 22:00:00 | -1.5707963267948966 |   2.4362361796315177 | 1.5707963267948966 | False |
| 24 | 2012-03-27 23:00:00 | -1.5707963267948966 |    2.698035567430667 | 1.5707963267948966 | False |
+----+---------------------+---------------------+----------------------+--------------------+-------+

omega是太阳角,以弧度为单位。 00:00和24:00的时间范围从-pi / 2到+ pi / 2。在中午,其值为0。

omegass是日落发生的小时角度。由于太阳地球系统的对称性,omegasr = -omegass。这些值在一天中是恒定的,但每天都会变化。

isday是一个条件表达式的结果:当omegasr < omega < omegasr时,这是一天,可以进行进一步的计算。

为了进行进一步的计算,我需要为每小时关联该度量涵盖的时间跨度的中点。因此,例如,中午测量是在12:00记录的,但是为了表示该小时的所有时间,我想将小时角设为12:30。因此,我需要一个

omegam[i] = (omega[i],omega[i+1]).mean() 

其中i代表索引。

但是这里出现了一个新问题:如果日落发生了,比方说,在上午6:40,则必须这样计算中点时间:

omegam[i] = (omegasr[i],omega[i+1]).mean() #sunrise
omegam[i] = (omega[i],omegass[i+1]).mean() #sunset

因此,每小时弧度角将对应于6:50 am。我创建了列isday来帮助执行此任务,但是很遗憾,我无法真正使用它。

谢谢。

编辑:

@Mabel Villaba提出的解决方案是不正确的,因为new_omega列仅包含日出和日落值。

coorect new_omega列为:

 new_omega  
... 
7   #here the mean is between omegasr and omega[8], therefore this new_omega value can't have a correct value, according to the proposed solution.

8   -1.2289552495565745 # = omega[9]       
9   omega[10]  #                  
10  omega[11]
... 
17   omega[18] 
18   omega[19] 
19   1.570796  #omegass
...

我希望足够清楚

EDIT2:

再次感谢您,但数值仍然不正确:平均值仍然计算错误。我已经手动计算出正确的值,我将在此处发布它们:

     omegam

...
7    -1.530775
8    -1.359855
9    -1.098058
...
13   -0.05256705
...
19   1.47992
...

EDIT3:

我认为由于布尔掩码而获得的列df['isday']可能会引起误解。

实际上:日出总是发生在两行之间,将其称为omega1omega2,它们分别属于row1row2。日落时也会发生同样的情况,但是omega3omega4会发生同样的情况。发生的情况是omegam中正确的row1的计算公式为:

omegam(row1) = (omegasr + omega2)/2

但是row1False列中有一个isday属性。

对于日落,情况恰好相反:发生在row3row4之间,其计算公式为:

omegam(row3) = (omega3 + omegass)/2

row3具有True属性。

1 个答案:

答案 0 :(得分:1)

编辑

在您提到的情况下,它有点复杂,但是我想我已经找到了解决方法。有一些误导,因为日出和日落时的操作并非总是朝同一方向进行。

让我们创建两个omega1和两个omegam[i] = 0.5 * (omega[i] + omegasr[i+1])来完成的omega2

omegam[i] = 0.5 * (omega[i-1] + omegass[i])

然后,我们需要创建一个遮罩,告诉我们日落还是日出,或者都不是:

df['omega1'] = .5*((df['omega'] + df['omegasr'].shift(-1)))   
df['omega2'] = .5*((df['omega'].shift(1) + df['omegass']))

这样,df['mask'] = (df['isday'] * 1).diff().bfill() >> df[['date','mask', 'isday']] date mask isday 0 2012-03-27 00:00:00 0.0 False 1 2012-03-27 01:00:00 0.0 False 2 2012-03-27 02:00:00 0.0 False 3 2012-03-27 03:00:00 0.0 False 4 2012-03-27 04:00:00 0.0 False 5 2012-03-27 05:00:00 0.0 False 6 2012-03-27 06:00:00 0.0 False 7 2012-03-27 07:00:00 1.0 True 8 2012-03-27 08:00:00 0.0 True 9 2012-03-27 09:00:00 0.0 True 10 2012-03-27 10:00:00 0.0 True 11 2012-03-27 11:00:00 0.0 True 12 2012-03-27 12:00:00 0.0 True 13 2012-03-27 13:00:00 0.0 True 14 2012-03-27 14:00:00 0.0 True 15 2012-03-27 15:00:00 0.0 True 16 2012-03-27 16:00:00 0.0 True 17 2012-03-27 17:00:00 0.0 True 18 2012-03-27 18:00:00 0.0 True 19 2012-03-27 19:00:00 -1.0 False 20 2012-03-27 20:00:00 0.0 False 21 2012-03-27 21:00:00 0.0 False 22 2012-03-27 22:00:00 0.0 False 23 2012-03-27 23:00:00 0.0 False 对应于日出,df['mask']==1对应于日落,df['mask']==-1对应于其余部分。

根据这些条件,我们可以创建df['mask']==0

omegam

旧解决方案

正如您提到的那样,自df['omegam'] = df['omega'].rolling(2).mean() * (df['mask'] == 0) + \ df['omega1'] * (df['mask']==1) + \ df['omega2'] * (df['mask']==-1) >> df[['date','omegam']] date omegam 0 2012-03-27 00:00:00 NaN 1 2012-03-27 01:00:00 -3.192451 2 2012-03-27 02:00:00 -2.930651 3 2012-03-27 03:00:00 -2.668852 4 2012-03-27 04:00:00 -2.407052 5 2012-03-27 05:00:00 -2.145253 6 2012-03-27 06:00:00 -1.883454 7 2012-03-27 07:00:00 -1.530775 8 2012-03-27 08:00:00 -1.359855 9 2012-03-27 09:00:00 -1.098056 10 2012-03-27 10:00:00 -0.836256 11 2012-03-27 11:00:00 -0.574457 12 2012-03-27 12:00:00 -0.312657 13 2012-03-27 13:00:00 -0.050858 14 2012-03-27 14:00:00 0.210941 15 2012-03-27 15:00:00 0.472741 16 2012-03-27 16:00:00 0.734540 17 2012-03-27 17:00:00 0.996340 18 2012-03-27 18:00:00 1.258139 19 2012-03-27 19:00:00 1.479917 20 2012-03-27 20:00:00 1.781738 21 2012-03-27 21:00:00 2.043537 22 2012-03-27 22:00:00 2.305336 23 2012-03-27 23:00:00 NaN 起,您可以根据小时在熊猫中创建新列,这样您就可以获取平均操作所需的omegasr = -omegass(如果日出(hour < 12):omegasr,否则:-omegasr):

omega

“ new_omega”中的数据已转移以符合

df['new_omega'] = df.apply(lambda x: x['omegasr'] if pd.to_datetime(x['date']).hour < 12 else -x['omegasr'], axis=1).shift(-1)

>> df

                     date   omegasr     omega   omegass  isday  new_omega

1    2012-03-27 00:00:00  -1.570796 -3.323350  1.570796  False  -1.570796
2    2012-03-27 01:00:00  -1.570796 -3.061551  1.570796  False  -1.570796
3    2012-03-27 02:00:00  -1.570796 -2.799752  1.570796  False  -1.570796
4    2012-03-27 03:00:00  -1.570796 -2.537952  1.570796  False  -1.570796
5    2012-03-27 04:00:00  -1.570796 -2.276153  1.570796  False  -1.570796
6    2012-03-27 05:00:00  -1.570796 -2.014353  1.570796  False  -1.570796
7    2012-03-27 06:00:00  -1.570796 -1.752554  1.570796  False  -1.570796
8    2012-03-27 07:00:00  -1.570796 -1.490755  1.570796   True  -1.570796
9    2012-03-27 08:00:00  -1.570796 -1.228955  1.570796   True  -1.570796
10   2012-03-27 09:00:00  -1.570796 -0.967156  1.570796   True  -1.570796
11   2012-03-27 10:00:00  -1.570796 -0.705356  1.570796   True  -1.570796
12   2012-03-27 11:00:00  -1.570796 -0.443557  1.570796   True   1.570796
13   2012-03-27 12:00:00  -1.570796 -0.181758  1.570796   True   1.570796
14   2012-03-27 13:00:00  -1.570796  0.080042  1.570796   True   1.570796
15   2012-03-27 14:00:00  -1.570796  0.341841  1.570796   True   1.570796
16   2012-03-27 15:00:00  -1.570796  0.603640  1.570796   True   1.570796
17   2012-03-27 16:00:00  -1.570796  0.865440  1.570796   True   1.570796
18   2012-03-27 17:00:00  -1.570796  1.127239  1.570796   True   1.570796
19   2012-03-27 18:00:00  -1.570796  1.389039  1.570796   True   1.570796
20   2012-03-27 19:00:00  -1.570796  1.650838  1.570796  False   1.570796
21   2012-03-27 20:00:00  -1.570796  1.912637  1.570796  False   1.570796
22   2012-03-27 21:00:00  -1.570796  2.174437  1.570796  False   1.570796
23   2012-03-27 22:00:00  -1.570796  2.436236  1.570796  False   1.570796
24   2012-03-27 23:00:00  -1.570796  2.698036  1.570796  False        NaN

然后,只要满足条件omegam[i] = (omegasr[i],omega[i+1]).mean() #sunrise omegam[i] = (omega[i],omegass[i+1]).mean() #sunset 或将omegam满足条件,就可以通过将均值应用于列omeganew_omega来获得df['isday']==True mean(omega[i], omega[i+1])

df['isday']==False

希望它有用。