pandas merge上的键错误(左连接)

时间:2015-03-24 20:36:11

标签: python pandas merge

我有两个数据帧,df_purchase(1)和df_login(2)

+--------+-----+--------+------------+--------------------+-------------+--------------------------+
|        | age | gender |    ttp     |       count        | sum(amount) |          region          |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+
|  49427 | 63  | M      | 824.731412 | 2                  | 25.00       | Omaha, Nebraska          |
|  28433 | 49  | M      | 1.166250   | 2                  | 41.94       | Catasauqua, Pennsylvania |
|   4162 | 29  | M      | 5.620949   | 2                  | 51.78       | Eagle Center, Iowa       |
|  18747 | 43  | M      | 153.502072 | 2                  | 23.84       | Pacific, Washington      |
|  45173 | 59  | M      | 0.027257   | 2                  | 13.98       | De Soto, Missouri        |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+

+--------+-----+--------+------------+--------------------+-------------+--------------------------+
|        | age | gender | count      | region             |             |                          |
| 671766 | 84  | M      | 13900      | New York, New York |             |                          |
| 671166 | 84  | F      | 7619       | New York, New York |             |                          |
| 672209 | 85  | F      | 6483       | New York, New York |             |                          |
| 672671 | 85  | M      | 5808       | New York, New York |             |                          |
| 195201 | 34  | M      | 3817       | New York, New York |             |                          |
+--------+-----+--------+------------+--------------------+-------------+--------------------------+

我正在尝试使用以下pandas代码将年龄,性别和地区的df_logins加入df_purchase:

df = pd.merge(df_purchase, df_login[['count']],
                       how='left', on=['age', 'gender', 'region'])

但是,我一直收到此错误:KeyError: 'age' 有什么想法吗?

1 个答案:

答案 0 :(得分:6)

KeyError产生于此:

df = pd.merge(df_purchase, df_login[['count']] <- this selects just count column,
                       how='left', on=['age', 'gender', 'region'])

您已经从df_login专门选择了一个列,您需要这样:

df = pd.merge(df_purchase, df_login,
                       how='left', on=['age', 'gender', 'region'])

我假设这不是您的完整数据,因为您在df_login的年龄和地区列中没有常用值。