在熊猫中嵌套排名的条件合并

时间:2019-11-06 11:25:15

标签: pandas merge

我正在尝试将具有嵌套排名的条件合并从SQL转换为Python-Pandas。 具体来说,我想合并两个表并添加一个条件,以确保1:1关系并指定要采用的值。 在SQL中,这将由带有Ranks的子查询实现,该子查询与条件单方面连接。

示例

我将客户记录表与客户请求表合并。 结果应显示其时间戳之前或之时的最新记录。

table: Customer_records
+---------+------+------------+
| Cust_ID | Name | Timestamp  |
+---------+------+------------+
|       1 | A    | 2013-01-01 |
|       1 | A    | 2014-01-01 |
|       1 | A    | 2015-12-01 |
|       2 | B    | 2014-01-01 |
|       3 | C    | 2014-01-01 |
+---------+------+------------+

table: customer_request
+--------+---------+------------+
| Req_ID | Cust_ID | Timestamp  |
+--------+---------+------------+
|      1 |       1 | 2013-01-01 |
|      2 |       1 | 2013-12-01 |
|      3 |       1 | 2015-01-01 |
|      4 |       2 | 2013-01-01 |
+--------+---------+------------+

table: merged
+---------+------+------------+--------+
| Cust_ID | Name | Timestamp  | Req_ID |
+---------+------+------------+--------+
|       1 | A    | 2013-01-01 | 1      |
|       1 | A    | 2014-01-01 | 2      |
|       1 | A    | 2015-12-01 | 3      |
|       2 | B    | 2014-01-01 | 4      |
|       3 | C    | 2014-01-01 | None   |
+---------+------+------------+--------+

1 个答案:

答案 0 :(得分:1)

使用https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_s3_cognito-bucket.html,仅需按DataFrame列对两个Timestamp进行排序:

Customer_records['Timestamp'] = pd.to_datetime(Customer_records['Timestamp'])
customer_request['Timestamp'] = pd.to_datetime(customer_request['Timestamp'])

Customer_records = Customer_records.sort_values('Timestamp')
customer_request = customer_request.sort_values('Timestamp')

df = pd.merge_asof(Customer_records, customer_request, on='Timestamp', by='Cust_ID')
   Cust_ID Name  Timestamp  Req_ID
0        1    A 2013-01-01     1.0
1        1    A 2014-01-01     2.0
2        2    B 2014-01-01     4.0
3        3    C 2014-01-01     NaN
4        1    A 2015-12-01     3.0