在两个数据帧列之间执行计算的最快方法?

时间:2017-12-07 23:04:13

标签: python pandas dataframe

我有一个包含600万行的pandas数据帧。列是:

['x', 'y']

我需要在xy之间应用简单计算,并将其附加到数据框。

这是我尝试过的:

'''
Calculates the height of a pressure level in feet
'''
def pressure_to_elevation(P, T = None):

    sea_level_pressure = 1013.25

    if T is not None:
        # https://www.omnicalculator.com/physics/air-pressure-at-altitude

        P0 = sea_level_pressure
        g = 9.80665
        M = 0.0289644
        R0 = 8.31447

        m = (np.log(P/P0)*T) / -(g*M/R0)
        f = 3.28084 * m
        return f

    b = 0.190284
    c = 145366.45

    return (1-math.pow((P/sea_level_pressure), b)) * c


test_df['result'] = test_fd.apply(lambda row: pressure_to_elevation(row['x'], row['y']),axis=1)

不幸的是,这需要花费大量时间......事实上,我还没有看到它完整。

有更快的方法吗?

2 个答案:

答案 0 :(得分:2)

试试这个:

def pressure_to_elevation(P, T):

    sea_level_pressure = 1013.25

    P0 = sea_level_pressure
    g = 9.80665
    M = 0.0289644
    R0 = 8.31447

    b = 0.190284
    c = 145366.45

    return np.where(T.notnull(),
                    3.28084 * ((np.log(P/P0)*T) / -(g*M/R0)),
                    (1-np.pow((P/sea_level_pressure), b)) * c)

用法:

test_df['result'] = pressure_to_elevation(test_df['x'], test_df['y'])

答案 1 :(得分:0)

我相信如果你将其分解为单独的步骤并避免遍历整个数据帧,速度将急剧增加。给出以下内容。

test_df['result_1'] = (test_df['x']/sea_level_pressure)
test_df['result_1'] = test_df['result']**0.190284
test_df['result_1'] = (1 - test_df['result'])*145366.45

test_df['result_2'] = 3.28084*((np.log(test_df['x']/sea_level_pressure)*test_df['y'])/(-1*(9.80665*0.0289644/8.31447)))

test_df['final_result'] = np.where(pd.isnull(test_df['y']), test_df['result_1'], test_df['result_2'])