我无法根据dfJANUARY和dfFEBRUARY中的国家/地区,货币和产品ID来计算费用。 python说“数组太大”
我的file.txt文件为dfJANUARY,具有35.6 mb
我的file.txt文件为dfFEBRUARY,具有36.3 mb
In[1]: dfJANUARY
Out[1]
Country PRODUCT ID currency fee
0 Arab Emirate COCA COLA USD 1000
1 Arab Emirate COCA COLA USD 1000
2 Arab Emirate COCA COLA USD 1009
86212 rows × 6 columns (unhide country: America ; PRODUCT ID: Fanta ; currency: SGD)
In[2]: dfFEBRUARY
Out[2]:
Country PRODUCT ID currency fee
0 Arab Emirate COCA COLA USD 2000
1 Arab Emirate COCA COLA USD 2000
2 Arab Emirate COCA COLA USD 2000
86212 rows × 6 columns (unhide country: America ; PRODUCT ID: Fanta; currency: SGD)
我已经尝试过编写代码,但是失败了
df = pd.merge(dfJANUARY,dfFEBRUARY, on = "fee", how = "inner")
* when i merge ther's warning:
valueerror array is too big arr.size * arr.dtype.itemsize
#made value of total
TOTAL = dfJANUARY[fee] + dfFEBRUARY[fee]
#made new column, it's name "TOTAL"
df["TOTAL"] = TOTAL
#made Pivot
gdf = df.pivot_table(index = ["PRODUCT ID","Country","currency"],values = ("TOTAL"), aggfunc="sum", fill_value=0)
所以这是我的经验,我可以根据货币类型,产品ID,国家/地区来汇总收入。这样我就可以拿到全部
你能帮我吗?
**expect**
dfEXPECT
TOTAL
Country PRODUCT ID currency
0 Arab Emirate COCA COLA USD 10000
SGD 15000
1 Arab Emirate Fanta USD 20000
SGD 30000
2 America COCA COLA USD 90000
SGD 95000
3 America Fanta USD 80000
SGD 75000
86212 rows × 6 columns
答案 0 :(得分:0)
在您的情况下,您想pd.concat
数据帧(将第二个“下面”放在第一个下面)。 pd.merge
失败令我感到惊讶,但是merge
却更难(因为它是更通用的功能)。
试试
df = pd.concat([df1,df2])
df.pivot_table(index = ["PRODUCT ID","Country","currency"],values = ("TOTAL"), aggfunc="sum", fill_value=0)
看看是否有帮助...