我的数据集如.csv,如下所示,我想使用python制作group by并添加一列作为internal_id。
unq_id name city country supplier
053 ABC CAL UA sup_01
054 DEF NY UA sup_01
055 ABC CAL UA sup_02
056 ABC CAL UA sup_03
057 DEF NY UA sup_02
internal_id unq_id supplier
001 053 sup_01
001 055 sup_02
001 056 sup_03
002 054 sup_01
002 057 sup_02
答案 0 :(得分:0)
您可以使用Pandas和Categorical Data执行此操作:
import pandas as pd
# read file
df = pd.read_csv('file.csv')
# define key columns
key_cols = ['name', 'city', 'country']
# convert to integer category codes
df['cat'] = df[key_cols].apply(tuple, axis=1).astype('category').cat.codes
# add one, convert to string, and format
df['cat'] = (df['cat'] + 1).apply(str).str.zfill(3)
# filter columns
res = df[['cat', 'unq_id', 'supplier']]
# output result to csv
res.to_csv('file_out.csv', index=False)
print(res)
cat unq_id supplier
0 001 53 sup_01
1 002 54 sup_01
2 001 55 sup_02
3 001 56 sup_03
4 002 57 sup_02