熊猫合并返回空数据框

时间:2019-05-22 05:09:45

标签: python-3.x pandas

我正在从两个不同的csv加载两个数据帧,并尝试将它们合并,但是由于某些原因,pd.merge没有加入数据并返回了空的数据帧。我曾尝试更改数据类型,但仍然没有。感谢任何帮助。

示例mon_apps数据帧:

app_cnts | year | month
16634    | 2018    |  8 
9636     | 2019    |  2 
17402    | 2017    |  8 
17472    | 2017    |  11 
15689    | 2018    |  6 

示例mon_spend数据帧:

channel  | month   | year  | spend
FB       | 1       |  2017 | 0
FB       | 2       |  2017 | 0
FB       | 3       |  2017 | 0
FB       | 4       |  2017 | 0
FB       | 5       |  2017 | 0

我这样更改数据类型(只是为了确保这不是问题):

mon_spend[['month', 'year', 'spend']] = mon_spend[['month', 'year', 'spend']].astype(np.int64)
mon_spend['channel'] = mon_spend['channel'].astype(str)
mon_apps = mon_apps.astype(np.int64)

我检查数据类型:

mon_spend
channel    object
month       int64
year        int64
spend       int64
dtype: object

mon_apps
app_cnts    int64
year        int64
month       int64
dtype: object

我像这样使用pd.merge加入

pd.merge(mon_apps[['app_cnts', 'year', 'month']], mon_spend, left_on = ["year", "month"], right_on = ["year", "month"])

感谢任何帮助。谢谢。

更多数据

channel month year spend




FB 2017 1 0 


FB 2017 2 0 


FB 2017 3 0 


FB 2017 4 0 


FB 2017 5 0 


FB 2017 6 0 


FB 2017 7 52514 


FB 2017 8 10198 


FB 2017 9 25408 


FB 2017 10 31333 


FB 2017 11 128071 


FB 2017 12 95160 


FB 2018 1 5001 


FB 2018 2 17929 


FB 2018 3 84548 


FB 2018 4 16414 


FB 2018 5 28282 


FB 2018 6 38430 


FB 2018 7 58757 


FB 2018 8 120722 


FB 2018 9 143766 


FB 2018 10 68400 


FB 2018 11 66984 


FB 2018 12 58228 

更多信息

print (mon_spend[["year", "month"]].sort_values(["year", "month"]).drop_duplicates().values.tolist())
[[2017, 1], [2017, 2], [2017, 3], [2017, 4], [2017, 5]]
​
print (mon_apps[["year", "month"]].sort_values(["year", "month"]).drop_duplicates().values.tolist())

[[2017, 8], [2017, 11], [2018, 6], [2018, 8], [2019, 2]]









[[1, 2017], [1, 2018], [1, 2019], [2, 2017], [2, 2018], [2, 2019], [3, 2017], [3, 2018], [4, 2017], [4, 2018], [5, 2017], [5, 2018], [6, 2017], [6, 2018], [7, 2017], [7, 2018], [8, 2017], [8, 2018], [9, 2017], [9, 2018], [10, 2017], [10, 2018], [11, 2017], [11, 2018], [12, 2017], [12, 2018]]
[[2017, 1], [2017, 2], [2017, 3], [2017, 4], [2017, 5], [2017, 6], [2017, 7], [2017, 8], [2017, 9], [2017, 10], [2017, 11], [2017, 12], [2018, 1], [2018, 2], [2018, 3], [2018, 4], [2018, 5], [2018, 6], [2018, 7], [2018, 8], [2018, 9], [2018, 10], [2018, 11], [2018, 12], [2019, 1], [2019, 2], [2019, 3]]




Out[18]:

[[2017, 8], [2017, 11], [2018, 6], [2018, 8], [2019, 2]]

2 个答案:

答案 0 :(得分:3)

要合并的文件中交换了月份和年份,请尝试在合并时交换密钥,或者在合并之前重命名它们:

mon_apps.merge(complete_file, left_on=["year", "month"],right_on=['month','year'])

答案 1 :(得分:2)

编辑:

问题是月份列与年份互换了。

更好的解决方案是重命名-通过功能rename或重命名文件中的标题:

df = pd.merge(mon_apps, mon_spend.rename(columns={'year':'month','month':'year'}), 
              on = ["year", "month"])
print (df)
   app_cnts  year  month hannel   spend
0     16634  2018      8     FB  120722
1     17402  2017      8     FB   10198
2     17472  2017     11     FB  128071
3     15689  2018      6     FB   38430

因为不是列名带有prefixes

df = mon_apps.merge(mon_spend, left_on=["year", "month"],right_on=['month','year'])
print (df)
   app_cnts  year_x  month_x hannel  month_y  year_y   spend
0     16634    2018        8     FB     2018       8  120722
1     17402    2017        8     FB     2017       8   10198
2     17472    2017       11     FB     2017      11  128071
3     15689    2018        6     FB     2018       6   38430