我当前正在使用此功能:
def age_groupf(row):
if row['Age'] <= 19:
val = '15-19'
elif row['Age'] <= 24:
val = '20-24'
elif row['Age'] <= 29:
val = '25-29'
elif row['Age'] <= 34:
val = '30-34'
elif row['Age'] <= 39:
val = '35-39'
elif row['Age'] <= 44:
val = '40-44'
elif row['Age'] <= 49:
val = '45-49'
elif row['Age'] <= 54:
val = '50-54'
elif row['Age'] <= 59:
val = '55-59'
else:
val = '60 and more'
return val
通过调用以下命令生成AGE-GROUP字段:
DF['AGE-GROUP'] = DF.apply(age_groupf, axis=1)
似乎正在运行,但速度很慢。我有多个100MB TXT文件,我需要更快一些。
答案 0 :(得分:1)
使用pandas.cut
和定义的bins
和labels
。
例如:
bins = [15, 20, 25, 30, 35, 40, 45, 50, 55, 60, np.inf]
labels = [f'{x}-{y-1}' if y!=np.inf else f'{x} and more' for x, y in zip(bins[::], bins[1::])]
pd.cut(df['Age'], bins=bins, labels=labels)