我有一个大型CSV文件,其值是:日期,存储,Empl_ID,技能。 我想创建一个新列是JSON:Empl具有在商店工作一天的技能
我的CSV文件:
Date Store ID_Empl Skill
20190517 9999 111756 1
20190517 9999 146465 2
20190519 C211 169838 3
20190519 C211 176859 1
20190521 C211 146465 2
20190510 D211 130171 1
20190510 D211 111756 2
我想要的CSV文件:
Date Store Empl_Skill
20190517 9999 {111765: 1, 146465: 2}
20190519 C211 {169838: 3, 176859: 1}
20190521 C211 {146465: 2}
20190510 D211 { 130171: 1, 111756: 2}
答案 0 :(得分:0)
1º使用pd.read_csv读取csv文件:
#import pandas as pd #import pandas library
df=pd.read_csv('data.csv')
print(df)
#Date Store ID_Empl Skill
#20190517 9999 111756 1
#20190517 9999 146465 2
#20190519 C211 169838 3
#20190519 C211 176859 1
#20190521 C211 146465 2
#20190510 D211 130171 1
#20190510 D211 111756 2
2º使用groupby.apply获取输出DataFrame:
new_df=df.groupby(['Date','Store']).apply(lambda x: dict(zip(x['ID_Empl'],x['Skill']))).rename('Empl_Skill').reset_index()
print(new_df)
Date Store Empl_Skill
0 20190510 D211 {130171: 1, 111756: 2}
1 20190517 9999 {111756: 1, 146465: 2}
2 20190519 C211 {169838: 3, 176859: 1}
3 20190521 C211 {146465: 2}
3 使用pd.to_csv保存csv:
new_df.to_csv('new_data.csv')