Question

我正在尝试用Python进行异常分析。由于我有多个不同长度的数据帧，我想在数据帧有10个观察时减去尾部和头部的2.5％，当它有100个等时减去0.25％。目前，我有一些似乎有用的代码。但是，我仍然觉得它可能会更有效率。这主要是因为最后两行。我觉得过滤器可以在一行中完成。另外，我不确定.loc在这里是否有用。也许有更好的方法来做到这一点？有没有人有建议？

这是我的第一个问题，所以如果有什么我可以用我的问题改进，请告诉我。）

目前，这是我的代码：

    df_filtered_3['variable'] = df_filtered_3['variable1'] / df_filtered_3['variable2']

    if len(df_filtered_3.index) <= 10:
        low = .025
        high = .0975
    elif len(df_filtered_3.index) <= 100:
        low = .0025
        high = .00975
    elif len(df_filtered_3.index) <= 1000:
        low = .00025
        high = .000975
    elif len(df_filtered_3.index) <= 10000:
        low = .000025
        high = .0000975
    else:
        low = .0000025
        high = .00000975

    quant_df = df_filtered_3.quantile([low, high])
    df_filtered_3 = df_filtered_3.loc[df_filtered_3['variable'] > int(quant_df.loc[low, 'variable']), :]
    df_filtered_3 = df_filtered_3.loc[df_filtered_3['variable'] < int(quant_df.loc[high, 'variable']), :]

Answer 1

你可以写得更短，但不一定更快：

In [57]: coefs = np.array([.025, .0975])

In [58]: coefs / pd.cut([len(df.index)], [0, 10, 100, 1000, 10000, np.inf], labels=[1, 10, 100, 1000, 10000], right=True)[0]
Out[58]: array([ 0.025 ,  0.0975])

示例：

In [59]: coefs / pd.cut([105], [0, 10, 100, 1000, 10000, np.inf], labels=[1, 10, 100, 1000, 10000], right=True)[0]
Out[59]: array([ 0.00025 ,  0.000975])

In [60]: coefs / pd.cut([1005], [0, 10, 100, 1000, 10000, np.inf], labels=[1, 10, 100, 1000, 10000], right=True)[0]
Out[60]: array([  2.50000000e-05,   9.75000000e-05])

异常值分析Python：有更好/更有效的方法吗？

1 个答案: