我有一个包含600万行的pandas数据帧。列是:
['x', 'y']
我需要在x
和y
之间应用简单计算,并将其附加到数据框。
这是我尝试过的:
'''
Calculates the height of a pressure level in feet
'''
def pressure_to_elevation(P, T = None):
sea_level_pressure = 1013.25
if T is not None:
# https://www.omnicalculator.com/physics/air-pressure-at-altitude
P0 = sea_level_pressure
g = 9.80665
M = 0.0289644
R0 = 8.31447
m = (np.log(P/P0)*T) / -(g*M/R0)
f = 3.28084 * m
return f
b = 0.190284
c = 145366.45
return (1-math.pow((P/sea_level_pressure), b)) * c
test_df['result'] = test_fd.apply(lambda row: pressure_to_elevation(row['x'], row['y']),axis=1)
不幸的是,这需要花费大量时间......事实上,我还没有看到它完整。
有更快的方法吗?
答案 0 :(得分:2)
试试这个:
def pressure_to_elevation(P, T):
sea_level_pressure = 1013.25
P0 = sea_level_pressure
g = 9.80665
M = 0.0289644
R0 = 8.31447
b = 0.190284
c = 145366.45
return np.where(T.notnull(),
3.28084 * ((np.log(P/P0)*T) / -(g*M/R0)),
(1-np.pow((P/sea_level_pressure), b)) * c)
用法:
test_df['result'] = pressure_to_elevation(test_df['x'], test_df['y'])
答案 1 :(得分:0)
我相信如果你将其分解为单独的步骤并避免遍历整个数据帧,速度将急剧增加。给出以下内容。
test_df['result_1'] = (test_df['x']/sea_level_pressure)
test_df['result_1'] = test_df['result']**0.190284
test_df['result_1'] = (1 - test_df['result'])*145366.45
test_df['result_2'] = 3.28084*((np.log(test_df['x']/sea_level_pressure)*test_df['y'])/(-1*(9.80665*0.0289644/8.31447)))
test_df['final_result'] = np.where(pd.isnull(test_df['y']), test_df['result_1'], test_df['result_2'])