我正在尝试将具有嵌套排名的条件合并从SQL转换为Python-Pandas。 具体来说,我想合并两个表并添加一个条件,以确保1:1关系并指定要采用的值。 在SQL中,这将由带有Ranks的子查询实现,该子查询与条件单方面连接。
示例
我将客户记录表与客户请求表合并。 结果应显示其时间戳之前或之时的最新记录。
table: Customer_records
+---------+------+------------+
| Cust_ID | Name | Timestamp |
+---------+------+------------+
| 1 | A | 2013-01-01 |
| 1 | A | 2014-01-01 |
| 1 | A | 2015-12-01 |
| 2 | B | 2014-01-01 |
| 3 | C | 2014-01-01 |
+---------+------+------------+
table: customer_request
+--------+---------+------------+
| Req_ID | Cust_ID | Timestamp |
+--------+---------+------------+
| 1 | 1 | 2013-01-01 |
| 2 | 1 | 2013-12-01 |
| 3 | 1 | 2015-01-01 |
| 4 | 2 | 2013-01-01 |
+--------+---------+------------+
table: merged
+---------+------+------------+--------+
| Cust_ID | Name | Timestamp | Req_ID |
+---------+------+------------+--------+
| 1 | A | 2013-01-01 | 1 |
| 1 | A | 2014-01-01 | 2 |
| 1 | A | 2015-12-01 | 3 |
| 2 | B | 2014-01-01 | 4 |
| 3 | C | 2014-01-01 | None |
+---------+------+------------+--------+
答案 0 :(得分:1)
使用https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_s3_cognito-bucket.html,仅需按DataFrame
列对两个Timestamp
进行排序:
Customer_records['Timestamp'] = pd.to_datetime(Customer_records['Timestamp'])
customer_request['Timestamp'] = pd.to_datetime(customer_request['Timestamp'])
Customer_records = Customer_records.sort_values('Timestamp')
customer_request = customer_request.sort_values('Timestamp')
df = pd.merge_asof(Customer_records, customer_request, on='Timestamp', by='Cust_ID')
Cust_ID Name Timestamp Req_ID
0 1 A 2013-01-01 1.0
1 1 A 2014-01-01 2.0
2 2 B 2014-01-01 4.0
3 3 C 2014-01-01 NaN
4 1 A 2015-12-01 3.0