我正在从两个不同的csv加载两个数据帧,并尝试将它们合并,但是由于某些原因,pd.merge
没有加入数据并返回了空的数据帧。我曾尝试更改数据类型,但仍然没有。感谢任何帮助。
示例mon_apps
数据帧:
app_cnts | year | month
16634 | 2018 | 8
9636 | 2019 | 2
17402 | 2017 | 8
17472 | 2017 | 11
15689 | 2018 | 6
示例mon_spend
数据帧:
channel | month | year | spend
FB | 1 | 2017 | 0
FB | 2 | 2017 | 0
FB | 3 | 2017 | 0
FB | 4 | 2017 | 0
FB | 5 | 2017 | 0
我这样更改数据类型(只是为了确保这不是问题):
mon_spend[['month', 'year', 'spend']] = mon_spend[['month', 'year', 'spend']].astype(np.int64)
mon_spend['channel'] = mon_spend['channel'].astype(str)
mon_apps = mon_apps.astype(np.int64)
我检查数据类型:
mon_spend
channel object
month int64
year int64
spend int64
dtype: object
mon_apps
app_cnts int64
year int64
month int64
dtype: object
我像这样使用pd.merge
加入
pd.merge(mon_apps[['app_cnts', 'year', 'month']], mon_spend, left_on = ["year", "month"], right_on = ["year", "month"])
感谢任何帮助。谢谢。
更多数据
channel month year spend
FB 2017 1 0
FB 2017 2 0
FB 2017 3 0
FB 2017 4 0
FB 2017 5 0
FB 2017 6 0
FB 2017 7 52514
FB 2017 8 10198
FB 2017 9 25408
FB 2017 10 31333
FB 2017 11 128071
FB 2017 12 95160
FB 2018 1 5001
FB 2018 2 17929
FB 2018 3 84548
FB 2018 4 16414
FB 2018 5 28282
FB 2018 6 38430
FB 2018 7 58757
FB 2018 8 120722
FB 2018 9 143766
FB 2018 10 68400
FB 2018 11 66984
FB 2018 12 58228
更多信息
print (mon_spend[["year", "month"]].sort_values(["year", "month"]).drop_duplicates().values.tolist())
[[2017, 1], [2017, 2], [2017, 3], [2017, 4], [2017, 5]]
print (mon_apps[["year", "month"]].sort_values(["year", "month"]).drop_duplicates().values.tolist())
[[2017, 8], [2017, 11], [2018, 6], [2018, 8], [2019, 2]]
[[1, 2017], [1, 2018], [1, 2019], [2, 2017], [2, 2018], [2, 2019], [3, 2017], [3, 2018], [4, 2017], [4, 2018], [5, 2017], [5, 2018], [6, 2017], [6, 2018], [7, 2017], [7, 2018], [8, 2017], [8, 2018], [9, 2017], [9, 2018], [10, 2017], [10, 2018], [11, 2017], [11, 2018], [12, 2017], [12, 2018]]
[[2017, 1], [2017, 2], [2017, 3], [2017, 4], [2017, 5], [2017, 6], [2017, 7], [2017, 8], [2017, 9], [2017, 10], [2017, 11], [2017, 12], [2018, 1], [2018, 2], [2018, 3], [2018, 4], [2018, 5], [2018, 6], [2018, 7], [2018, 8], [2018, 9], [2018, 10], [2018, 11], [2018, 12], [2019, 1], [2019, 2], [2019, 3]]
Out[18]:
[[2017, 8], [2017, 11], [2018, 6], [2018, 8], [2019, 2]]
答案 0 :(得分:3)
要合并的文件中交换了月份和年份,请尝试在合并时交换密钥,或者在合并之前重命名它们:
mon_apps.merge(complete_file, left_on=["year", "month"],right_on=['month','year'])
答案 1 :(得分:2)
编辑:
问题是月份列与年份互换了。
更好的解决方案是重命名-通过功能rename
或重命名文件中的标题:
df = pd.merge(mon_apps, mon_spend.rename(columns={'year':'month','month':'year'}),
on = ["year", "month"])
print (df)
app_cnts year month hannel spend
0 16634 2018 8 FB 120722
1 17402 2017 8 FB 10198
2 17472 2017 11 FB 128071
3 15689 2018 6 FB 38430
因为不是列名带有prefixes
:
df = mon_apps.merge(mon_spend, left_on=["year", "month"],right_on=['month','year'])
print (df)
app_cnts year_x month_x hannel month_y year_y spend
0 16634 2018 8 FB 2018 8 120722
1 17402 2017 8 FB 2017 8 10198
2 17472 2017 11 FB 2017 11 128071
3 15689 2018 6 FB 2018 6 38430