我有一个包含大量数据的.csv,通常如下所示:
Customer City Month Amount
Wayne E Gotham January 111
Wayne E Gotham January 222
Wayne E Chicago March 392
Wayne E Buffalo June 2928
Clark K Krypton January 100
Clark K Amman February 200
Clark K Detroit February 300
我尝试创建一个摘要数据框,列出每个客户,然后列出他们所在的唯一城市,然后sum
列出该月的Amount
。
因此,对于上述内容,我希望我的输出看起来像:
Customer City January February March April May June ... December
Wayne E Gotham 333
Wayne E Chicago 392
Wayne E Buffalo 2928
Clark K Krypton 100
Clark K Amman 200
Clark K Detroit 200
到目前为止,我已经能够获得独特的客户和城市,但我正在努力如何填充月份列。我甚至不确定我是否以最佳方式设置了我的摘要数据框架,所以我已经想到了它。
这是我到目前为止所拥有的:
df = pd.read_csv("mycsv.csv", encoding='cp1252')
customers = df["Customer"].unique()
cities = df["City"].unique()
summary_df = pd.DataFrame(columns=["Assured","Facility", "January","February","March","April","May","June","July","August","September", "October", "November","December"])
答案 0 :(得分:1)
您在寻找pivot
吗?
df.pivot_table(index=['Customer','City'],columns='Month',values='Amount').reindex(columns=['January','February','March','April', 'May','June']).fillna('').reset_index()
Out[83]:
Month Customer City January February March April May June
0 ClarkK Amman 200
1 ClarkK Detroit 300
2 ClarkK Krypton 100
3 WayneE Buffalo 2928
4 WayneE Chicago 392
5 WayneE Gotham 166.5