基于标题的超过2个数据帧的合并和求和时,valueerror数组的arr.size太大

时间:2019-05-25 19:36:42

标签: python pandas

我无法根据dfJANUARY和dfFEBRUARY中的国家/地区,货币和产品ID来计算费用。  python说“数组太大”

我的file.txt文件为dfJANUARY,具有35.6 mb

我的file.txt文件为dfFEBRUARY,具有36.3 mb

In[1]: dfJANUARY
Out[1]
  Country         PRODUCT ID    currency   fee

0  Arab Emirate    COCA COLA      USD       1000
1  Arab Emirate    COCA COLA      USD       1000
2  Arab Emirate    COCA COLA      USD       1009

86212 rows × 6 columns (unhide country: America ; PRODUCT ID: Fanta ; currency: SGD) 

In[2]: dfFEBRUARY
Out[2]:
  Country         PRODUCT ID    currency   fee

0  Arab Emirate    COCA COLA      USD       2000
1  Arab Emirate    COCA COLA      USD       2000
2  Arab Emirate    COCA COLA      USD       2000

86212 rows × 6 columns (unhide country: America ; PRODUCT ID: Fanta; currency: SGD)

我已经尝试过编写代码,但是失败了

df = pd.merge(dfJANUARY,dfFEBRUARY, on = "fee", how = "inner")

* when i merge ther's warning:
valueerror array is too big arr.size * arr.dtype.itemsize

#made value of total
TOTAL = dfJANUARY[fee] + dfFEBRUARY[fee] 

#made new column, it's name "TOTAL"
df["TOTAL"] = TOTAL

#made Pivot
gdf = df.pivot_table(index = ["PRODUCT ID","Country","currency"],values = ("TOTAL"), aggfunc="sum", fill_value=0)

所以这是我的经验,我可以根据货币类型,产品ID,国家/地区来汇总收入。这样我就可以拿到全部

你能帮我吗?

**expect**
 dfEXPECT
                                           TOTAL
  Country         PRODUCT ID    currency   

0  Arab Emirate    COCA COLA      USD       10000
                                  SGD       15000
1  Arab Emirate    Fanta          USD       20000
                                  SGD       30000
2  America         COCA COLA      USD       90000
                                  SGD       95000
3  America         Fanta          USD       80000
                                  SGD       75000
86212 rows × 6 columns

1 个答案:

答案 0 :(得分:0)

在您的情况下,您想pd.concat数据帧(将第二个“下面”放在第一个下面)。 pd.merge失败令我感到惊讶,但是merge却更难(因为它是更通用的功能)。
试试

df = pd.concat([df1,df2])
df.pivot_table(index = ["PRODUCT ID","Country","currency"],values = ("TOTAL"), aggfunc="sum", fill_value=0)

看看是否有帮助...