搜索了类似的问题后,我发现了this和this个问题。不幸的是,他们俩都不与我合作。
第一个在所有列上都有效,第二个在我的True
和False
上不起作用,并返回错误(我也没有完全理解)。
这是问题的描述:
我正在使用约54k行的数据框。这是24个值的示例:
+----+---------------------+---------------------+----------------------+--------------------+-------+
| | date | omegasr | omega | omegass | isday |
+----+---------------------+---------------------+----------------------+--------------------+-------+
| 1 | 2012-03-27 00:00:00 | -1.5707963267948966 | -3.32335035194977 | 1.5707963267948966 | False |
| 2 | 2012-03-27 01:00:00 | -1.5707963267948966 | -3.0615509641506207 | 1.5707963267948966 | False |
| 3 | 2012-03-27 02:00:00 | -1.5707963267948966 | -2.799751576351471 | 1.5707963267948966 | False |
| 4 | 2012-03-27 03:00:00 | -1.5707963267948966 | -2.5379521885523215 | 1.5707963267948966 | False |
| 5 | 2012-03-27 04:00:00 | -1.5707963267948966 | -2.2761528007531724 | 1.5707963267948966 | False |
| 6 | 2012-03-27 05:00:00 | -1.5707963267948966 | -2.014353412954023 | 1.5707963267948966 | False |
| 7 | 2012-03-27 06:00:00 | -1.5707963267948966 | -1.7525540251548732 | 1.5707963267948966 | False |
| 8 | 2012-03-27 07:00:00 | -1.5707963267948966 | -1.4907546373557239 | 1.5707963267948966 | True |
| 9 | 2012-03-27 08:00:00 | -1.5707963267948966 | -1.2289552495565745 | 1.5707963267948966 | True |
| 10 | 2012-03-27 09:00:00 | -1.5707963267948966 | -0.9671558617574253 | 1.5707963267948966 | True |
| 11 | 2012-03-27 10:00:00 | -1.5707963267948966 | -0.7053564739582756 | 1.5707963267948966 | True |
| 12 | 2012-03-27 11:00:00 | -1.5707963267948966 | -0.44355708615912615 | 1.5707963267948966 | True |
| 13 | 2012-03-27 12:00:00 | -1.5707963267948966 | -0.1817576983599767 | 1.5707963267948966 | True |
| 14 | 2012-03-27 13:00:00 | -1.5707963267948966 | 0.08004168943917273 | 1.5707963267948966 | True |
| 15 | 2012-03-27 14:00:00 | -1.5707963267948966 | 0.34184107723832213 | 1.5707963267948966 | True |
| 16 | 2012-03-27 15:00:00 | -1.5707963267948966 | 0.6036404650374716 | 1.5707963267948966 | True |
| 17 | 2012-03-27 16:00:00 | -1.5707963267948966 | 0.8654398528366211 | 1.5707963267948966 | True |
| 18 | 2012-03-27 17:00:00 | -1.5707963267948966 | 1.127239240635771 | 1.5707963267948966 | True |
| 19 | 2012-03-27 18:00:00 | -1.5707963267948966 | 1.3890386284349199 | 1.5707963267948966 | True |
| 20 | 2012-03-27 19:00:00 | -1.5707963267948966 | 1.6508380162340692 | 1.5707963267948966 | False |
| 21 | 2012-03-27 20:00:00 | -1.5707963267948966 | 1.9126374040332188 | 1.5707963267948966 | False |
| 22 | 2012-03-27 21:00:00 | -1.5707963267948966 | 2.174436791832368 | 1.5707963267948966 | False |
| 23 | 2012-03-27 22:00:00 | -1.5707963267948966 | 2.4362361796315177 | 1.5707963267948966 | False |
| 24 | 2012-03-27 23:00:00 | -1.5707963267948966 | 2.698035567430667 | 1.5707963267948966 | False |
+----+---------------------+---------------------+----------------------+--------------------+-------+
omega
是太阳角,以弧度为单位。 00:00和24:00的时间范围从-pi / 2到+ pi / 2。在中午,其值为0。
omegass
是日落发生的小时角度。由于太阳地球系统的对称性,omegasr = -omegass
。这些值在一天中是恒定的,但每天都会变化。
列isday
是一个条件表达式的结果:当omegasr < omega < omegasr
时,这是一天,可以进行进一步的计算。
为了进行进一步的计算,我需要为每小时关联该度量涵盖的时间跨度的中点。因此,例如,中午测量是在12:00记录的,但是为了表示该小时的所有时间,我想将小时角设为12:30。因此,我需要一个
omegam[i] = (omega[i],omega[i+1]).mean()
其中i
代表索引。
但是这里出现了一个新问题:如果日落发生了,比方说,在上午6:40,则必须这样计算中点时间:
omegam[i] = (omegasr[i],omega[i+1]).mean() #sunrise
omegam[i] = (omega[i],omegass[i+1]).mean() #sunset
因此,每小时弧度角将对应于6:50 am。我创建了列isday
来帮助执行此任务,但是很遗憾,我无法真正使用它。
谢谢。
编辑:
@Mabel Villaba提出的解决方案是不正确的,因为new_omega
列仅包含日出和日落值。
coorect new_omega
列为:
new_omega
...
7 #here the mean is between omegasr and omega[8], therefore this new_omega value can't have a correct value, according to the proposed solution.
8 -1.2289552495565745 # = omega[9]
9 omega[10] #
10 omega[11]
...
17 omega[18]
18 omega[19]
19 1.570796 #omegass
...
我希望足够清楚
EDIT2:
再次感谢您,但数值仍然不正确:平均值仍然计算错误。我已经手动计算出正确的值,我将在此处发布它们:
omegam
...
7 -1.530775
8 -1.359855
9 -1.098058
...
13 -0.05256705
...
19 1.47992
...
EDIT3:
我认为由于布尔掩码而获得的列df['isday']
可能会引起误解。
实际上:日出总是发生在两行之间,将其称为omega1
和omega2
,它们分别属于row1
和row2
。日落时也会发生同样的情况,但是omega3
和omega4
会发生同样的情况。发生的情况是omegam
中正确的row1
的计算公式为:
omegam(row1) = (omegasr + omega2)/2
但是row1
在False
列中有一个isday
属性。
对于日落,情况恰好相反:发生在row3
和row4
之间,其计算公式为:
omegam(row3) = (omega3 + omegass)/2
和row3
具有True
属性。
答案 0 :(得分:1)
编辑
在您提到的情况下,它有点复杂,但是我想我已经找到了解决方法。有一些误导,因为日出和日落时的操作并非总是朝同一方向进行。
让我们创建两个omega1
和两个omegam[i] = 0.5 * (omega[i] + omegasr[i+1])
来完成的omega2
:
omegam[i] = 0.5 * (omega[i-1] + omegass[i])
然后,我们需要创建一个遮罩,告诉我们日落还是日出,或者都不是:
df['omega1'] = .5*((df['omega'] + df['omegasr'].shift(-1)))
df['omega2'] = .5*((df['omega'].shift(1) + df['omegass']))
这样,df['mask'] = (df['isday'] * 1).diff().bfill()
>> df[['date','mask', 'isday']]
date mask isday
0 2012-03-27 00:00:00 0.0 False
1 2012-03-27 01:00:00 0.0 False
2 2012-03-27 02:00:00 0.0 False
3 2012-03-27 03:00:00 0.0 False
4 2012-03-27 04:00:00 0.0 False
5 2012-03-27 05:00:00 0.0 False
6 2012-03-27 06:00:00 0.0 False
7 2012-03-27 07:00:00 1.0 True
8 2012-03-27 08:00:00 0.0 True
9 2012-03-27 09:00:00 0.0 True
10 2012-03-27 10:00:00 0.0 True
11 2012-03-27 11:00:00 0.0 True
12 2012-03-27 12:00:00 0.0 True
13 2012-03-27 13:00:00 0.0 True
14 2012-03-27 14:00:00 0.0 True
15 2012-03-27 15:00:00 0.0 True
16 2012-03-27 16:00:00 0.0 True
17 2012-03-27 17:00:00 0.0 True
18 2012-03-27 18:00:00 0.0 True
19 2012-03-27 19:00:00 -1.0 False
20 2012-03-27 20:00:00 0.0 False
21 2012-03-27 21:00:00 0.0 False
22 2012-03-27 22:00:00 0.0 False
23 2012-03-27 23:00:00 0.0 False
对应于日出,df['mask']==1
对应于日落,df['mask']==-1
对应于其余部分。
根据这些条件,我们可以创建df['mask']==0
:
omegam
旧解决方案:
正如您提到的那样,自df['omegam'] = df['omega'].rolling(2).mean() * (df['mask'] == 0) + \
df['omega1'] * (df['mask']==1) + \
df['omega2'] * (df['mask']==-1)
>> df[['date','omegam']]
date omegam
0 2012-03-27 00:00:00 NaN
1 2012-03-27 01:00:00 -3.192451
2 2012-03-27 02:00:00 -2.930651
3 2012-03-27 03:00:00 -2.668852
4 2012-03-27 04:00:00 -2.407052
5 2012-03-27 05:00:00 -2.145253
6 2012-03-27 06:00:00 -1.883454
7 2012-03-27 07:00:00 -1.530775
8 2012-03-27 08:00:00 -1.359855
9 2012-03-27 09:00:00 -1.098056
10 2012-03-27 10:00:00 -0.836256
11 2012-03-27 11:00:00 -0.574457
12 2012-03-27 12:00:00 -0.312657
13 2012-03-27 13:00:00 -0.050858
14 2012-03-27 14:00:00 0.210941
15 2012-03-27 15:00:00 0.472741
16 2012-03-27 16:00:00 0.734540
17 2012-03-27 17:00:00 0.996340
18 2012-03-27 18:00:00 1.258139
19 2012-03-27 19:00:00 1.479917
20 2012-03-27 20:00:00 1.781738
21 2012-03-27 21:00:00 2.043537
22 2012-03-27 22:00:00 2.305336
23 2012-03-27 23:00:00 NaN
起,您可以根据小时在熊猫中创建新列,这样您就可以获取平均操作所需的omegasr = -omegass
(如果日出(hour < 12):omegasr,否则:-omegasr):
omega
“ new_omega”中的数据已转移以符合
df['new_omega'] = df.apply(lambda x: x['omegasr'] if pd.to_datetime(x['date']).hour < 12 else -x['omegasr'], axis=1).shift(-1)
>> df
date omegasr omega omegass isday new_omega
1 2012-03-27 00:00:00 -1.570796 -3.323350 1.570796 False -1.570796
2 2012-03-27 01:00:00 -1.570796 -3.061551 1.570796 False -1.570796
3 2012-03-27 02:00:00 -1.570796 -2.799752 1.570796 False -1.570796
4 2012-03-27 03:00:00 -1.570796 -2.537952 1.570796 False -1.570796
5 2012-03-27 04:00:00 -1.570796 -2.276153 1.570796 False -1.570796
6 2012-03-27 05:00:00 -1.570796 -2.014353 1.570796 False -1.570796
7 2012-03-27 06:00:00 -1.570796 -1.752554 1.570796 False -1.570796
8 2012-03-27 07:00:00 -1.570796 -1.490755 1.570796 True -1.570796
9 2012-03-27 08:00:00 -1.570796 -1.228955 1.570796 True -1.570796
10 2012-03-27 09:00:00 -1.570796 -0.967156 1.570796 True -1.570796
11 2012-03-27 10:00:00 -1.570796 -0.705356 1.570796 True -1.570796
12 2012-03-27 11:00:00 -1.570796 -0.443557 1.570796 True 1.570796
13 2012-03-27 12:00:00 -1.570796 -0.181758 1.570796 True 1.570796
14 2012-03-27 13:00:00 -1.570796 0.080042 1.570796 True 1.570796
15 2012-03-27 14:00:00 -1.570796 0.341841 1.570796 True 1.570796
16 2012-03-27 15:00:00 -1.570796 0.603640 1.570796 True 1.570796
17 2012-03-27 16:00:00 -1.570796 0.865440 1.570796 True 1.570796
18 2012-03-27 17:00:00 -1.570796 1.127239 1.570796 True 1.570796
19 2012-03-27 18:00:00 -1.570796 1.389039 1.570796 True 1.570796
20 2012-03-27 19:00:00 -1.570796 1.650838 1.570796 False 1.570796
21 2012-03-27 20:00:00 -1.570796 1.912637 1.570796 False 1.570796
22 2012-03-27 21:00:00 -1.570796 2.174437 1.570796 False 1.570796
23 2012-03-27 22:00:00 -1.570796 2.436236 1.570796 False 1.570796
24 2012-03-27 23:00:00 -1.570796 2.698036 1.570796 False NaN
然后,只要满足条件omegam[i] = (omegasr[i],omega[i+1]).mean() #sunrise
omegam[i] = (omega[i],omegass[i+1]).mean() #sunset
或将omegam
满足条件,就可以通过将均值应用于列omega
和new_omega
来获得df['isday']==True
mean(omega[i], omega[i+1])
:
df['isday']==False
希望它有用。