我想为每个客户建立一个连续的日期表。
让我们假设我有这个数据框
con = pyodbc.connect (....)
我之所以选择dateadd(day,-1,getdate())的原因是,由于表中仅有昨天的getdate()数据。
SQL_Until_Today = pd.read_sql_query("Select date, customer,value from account where date < convert(date,dateadd(day,-1,getdate()))", con)
account = pd.dataframe(SQL_Until_Today , columns = ['date','customer','value'])
SQL_Today = pd.read_sql_query("Select date, customer,value from account where date = convert(date,dateadd(day,-1,getdate()))",con)
account_Today = pd.dataframe(SQL_Today,columns =
['date', 'customer','value'])
account = account.append(account_Today)
所以从这两个中我最终得到一个名为account的数据框,它看起来像这样:
date customer value
2019-06-27 100 40
2019-06-28 100 30
2019-06-30 100 20
2019-07-01 100 10
2019-07-02 100 18
2019-06-21 200 460
2019-06-23 200 430
2019-06-24 200 410
2019-06-25 200 130
2019-06-26 200 210
2019-06-27 200 410
2019-06-28 200 310
2019-06-30 200 210
2019-07-01 200 110
2019-07-02 200 118
我需要为每个客户从其表中的min_date开始创建一个连续的日期表。
例如:
customer = 100 --> 2019-06-27
customer = 200 --> 2019-06-21
因此,我希望的帐户数据帧输出为:
date customer value
2019-06-27 100 40
2019-06-28 100 30
2019-06-29 100 30 *************** The most closer value before!
2019-06-30 100 20
2019-07-01 100 10
2019-07-02 100 18
2019-07-03 100 18 **************** The most closer value before!
2019-06-21 200 460
2019-06-22 200 460 *************** The most closer value before!
2019-06-23 200 430
2019-06-24 200 410
2019-06-25 200 130
2019-06-26 200 210
2019-06-27 200 410
2019-06-28 200 310
2019-06-29 200 310 *************** The most closer value before!
2019-06-30 200 210
2019-07-01 200 110
2019-07-02 200 118
2019-07-03 200 118 *************** The most closer value before!
如果两个日期之间有一个间隔,我还是要从最近的日期开始取值。
任何帮助我如何有效执行?
答案 0 :(得分:0)
一种常见的方法是使用单独的“日期表”,其中每个有效日期包含一行,该有效日期涵盖(或超过)您需要查询的范围。例如,在这种特殊情况下,如下表就足够了:
date_table
date
----------
2019-06-15
2019-06-16
2019-06-17
2019-06-18
2019-06-19
2019-06-20
2019-06-21
2019-06-22
2019-06-23
2019-06-24
2019-06-25
2019-06-26
2019-06-27
2019-06-28
2019-06-29
2019-06-30
2019-07-01
2019-07-02
2019-07-03
2019-07-04
2019-07-05
给出您现有的数据
account
date customer value
---------- -------- -----
2019-06-27 100 40
2019-06-28 100 30
2019-06-30 100 20
2019-07-01 100 10
2019-07-02 100 18
2019-06-21 200 460
2019-06-23 200 430
2019-06-24 200 410
2019-06-25 200 130
2019-06-26 200 210
2019-06-27 200 410
2019-06-28 200 310
2019-06-30 200 210
2019-07-01 200 110
2019-07-02 200 118
您将从一个包含每个客户的每个实际日期的查询开始
SELECT date_table.date AS actual_date, cust.customer
FROM
date_table,
(SELECT DISTINCT account.customer FROM account) cust
WHERE
date_table.date >= (SELECT MIN(account.date) FROM account)
AND
date_table.date <= (SELECT MAX(account.date) FROM account)
接下来,将以上内容包装为子查询(命名为cust_date),以确定每个客户的参考日期/实际日期
SELECT cust_date.actual_date AS actual_date, cust_date.customer, MAX(acc.date) AS reference_date
FROM
(
SELECT date_table.date AS actual_date, cust.customer
FROM
date_table,
(SELECT DISTINCT account.customer FROM account) cust
WHERE
date_table.date >= (SELECT MIN(account.date) FROM account)
AND
date_table.date <= (SELECT MAX(account.date) FROM account)
) cust_date
INNER JOIN
account acc
ON acc.customer = cust_date.customer AND acc.date <= cust_date.actual_date
GROUP BY cust_date.actual_date, cust_date.customer
最后,将 that 包装为子查询(名为ref_date),以基于reference_date提取reference_value
SELECT ref_date.actual_date, ref_date.customer, acc.value
FROM
(
SELECT cust_date.actual_date AS actual_date, cust_date.customer, MAX(acc.date) AS reference_date
FROM
(
SELECT date_table.date AS actual_date, cust.customer
FROM
date_table,
(SELECT DISTINCT account.customer FROM account) cust
WHERE
date_table.date >= (SELECT MIN(account.date) FROM account)
AND
date_table.date <= (SELECT MAX(account.date) FROM account)
) cust_date
INNER JOIN
account acc
ON acc.customer = cust_date.customer AND acc.date <= cust_date.actual_date
GROUP BY cust_date.actual_date, cust_date.customer
) ref_date
INNER JOIN
account acc
ON acc.customer = ref_date.customer AND acc.date = ref_date.reference_date
ORDER BY ref_date.customer, ref_date.actual_date
产生
actual_date customer value
----------- -------- -----
2019-06-27 100 40
2019-06-28 100 30
2019-06-29 100 30
2019-06-30 100 20
2019-07-01 100 10
2019-07-02 100 18
2019-06-21 200 460
2019-06-22 200 460
2019-06-23 200 430
2019-06-24 200 410
2019-06-25 200 130
2019-06-26 200 210
2019-06-27 200 410
2019-06-28 200 310
2019-06-29 200 310
2019-06-30 200 210
2019-07-01 200 110
2019-07-02 200 118