基于多种条件在熊猫中创建列的最快方法

时间:2019-07-30 12:06:12

标签: python-3.x pandas

我当前正在使用此功能:

def age_groupf(row):
    if row['Age'] <= 19:
        val = '15-19'
    elif row['Age'] <= 24:
        val = '20-24'
    elif row['Age'] <= 29:
        val = '25-29'
    elif row['Age'] <= 34:
        val = '30-34'
    elif row['Age'] <= 39:
        val = '35-39'
    elif row['Age'] <= 44:
        val = '40-44'
    elif row['Age'] <= 49:
        val = '45-49'
    elif row['Age'] <= 54:
        val = '50-54'
    elif row['Age'] <= 59:
        val = '55-59'
    else:
        val = '60 and more'
    return val

通过调用以下命令生成AGE-GROUP字段:

DF['AGE-GROUP'] = DF.apply(age_groupf, axis=1)

似乎正在运行,但速度很慢。我有多个100MB TXT文件,我需要更快一些。

1 个答案:

答案 0 :(得分:1)

使用pandas.cut和定义的binslabels

例如:

bins = [15, 20, 25, 30, 35, 40, 45, 50, 55, 60, np.inf]
labels = [f'{x}-{y-1}' if y!=np.inf else f'{x} and more' for x, y in zip(bins[::], bins[1::])]

pd.cut(df['Age'], bins=bins, labels=labels)