查询以使用前一个日期的值填充缺少的日期

时间:2019-07-03 10:14:08

标签: python sql pandas pycharm pyodbc

我想为每个客户建立一个连续的日期表。

让我们假设我有这个数据框

 con = pyodbc.connect (....)

我之所以选择dateadd(day,-1,getdate())的原因是,由于表中仅有昨天的getdate()数据。

SQL_Until_Today = pd.read_sql_query("Select date, customer,value from account where date < convert(date,dateadd(day,-1,getdate()))", con)

    account  = pd.dataframe(SQL_Until_Today , columns = ['date','customer','value'])

SQL_Today = pd.read_sql_query("Select date, customer,value from account where date = convert(date,dateadd(day,-1,getdate()))",con)
    account_Today = pd.dataframe(SQL_Today,columns =
    ['date', 'customer','value'])

    account = account.append(account_Today)

所以从这两个中我最终得到一个名为account的数据框,它看起来像这样:

date         customer value
2019-06-27    100       40
2019-06-28    100       30
2019-06-30    100       20
2019-07-01    100       10
2019-07-02    100       18
2019-06-21    200       460
2019-06-23    200       430
2019-06-24    200       410
2019-06-25    200       130
2019-06-26    200       210
2019-06-27    200       410
2019-06-28    200       310
2019-06-30    200       210
2019-07-01    200       110
2019-07-02    200       118

我需要为每个客户从其表中的min_date开始创建一个连续的日期表。

例如:

customer = 100 --> 2019-06-27
customer = 200 --> 2019-06-21

因此,我希望的帐户数据帧输出为:

date         customer value
2019-06-27    100       40
2019-06-28    100       30
2019-06-29    100       30 *************** The most closer value before!
2019-06-30    100       20
2019-07-01    100       10
2019-07-02    100       18
2019-07-03    100       18 **************** The most closer value before!
2019-06-21    200       460
2019-06-22    200       460 *************** The most closer value before!
2019-06-23    200       430
2019-06-24    200       410
2019-06-25    200       130
2019-06-26    200       210
2019-06-27    200       410
2019-06-28    200       310
2019-06-29    200       310 *************** The most closer value before!
2019-06-30    200       210
2019-07-01    200       110
2019-07-02    200       118
2019-07-03    200       118 *************** The most closer value before!

如果两个日期之间有一个间隔,我还是要从最近的日期开始取值。

任何帮助我如何有效执行?

1 个答案:

答案 0 :(得分:0)

一种常见的方法是使用单独的“日期表”,其中每个有效日期包含一行,该有效日期涵盖(或超过)您需要查询的范围。例如,在这种特殊情况下,如下表就足够了:

date_table

date      
----------
2019-06-15
2019-06-16
2019-06-17
2019-06-18
2019-06-19
2019-06-20
2019-06-21
2019-06-22
2019-06-23
2019-06-24
2019-06-25
2019-06-26
2019-06-27
2019-06-28
2019-06-29
2019-06-30
2019-07-01
2019-07-02
2019-07-03
2019-07-04
2019-07-05

给出您现有的数据

account

date        customer  value
----------  --------  -----
2019-06-27       100     40
2019-06-28       100     30
2019-06-30       100     20
2019-07-01       100     10
2019-07-02       100     18
2019-06-21       200    460
2019-06-23       200    430
2019-06-24       200    410
2019-06-25       200    130
2019-06-26       200    210
2019-06-27       200    410
2019-06-28       200    310
2019-06-30       200    210
2019-07-01       200    110
2019-07-02       200    118

您将从一个包含每个客户的每个实际日期的查询开始

SELECT date_table.date AS actual_date, cust.customer
FROM 
    date_table,
    (SELECT DISTINCT account.customer FROM account) cust
WHERE 
    date_table.date >= (SELECT MIN(account.date) FROM account)
    AND
    date_table.date <= (SELECT MAX(account.date) FROM account)

接下来,将以上内容包装为子查询(命名为cust_date),以确定每个客户的参考日期/实际日期

SELECT cust_date.actual_date AS actual_date, cust_date.customer, MAX(acc.date) AS reference_date
FROM 
    (
        SELECT date_table.date AS actual_date, cust.customer
        FROM 
            date_table,
            (SELECT DISTINCT account.customer FROM account) cust
        WHERE 
            date_table.date >= (SELECT MIN(account.date) FROM account)
            AND
            date_table.date <= (SELECT MAX(account.date) FROM account)
    ) cust_date
    INNER JOIN 
    account acc 
        ON acc.customer = cust_date.customer AND acc.date <= cust_date.actual_date
GROUP BY cust_date.actual_date, cust_date.customer

最后,将 that 包装为子查询(名为ref_date),以基于reference_date提取reference_value

SELECT ref_date.actual_date, ref_date.customer, acc.value
FROM
    (
        SELECT cust_date.actual_date AS actual_date, cust_date.customer, MAX(acc.date) AS reference_date
        FROM 
            (
                SELECT date_table.date AS actual_date, cust.customer
                FROM 
                    date_table,
                    (SELECT DISTINCT account.customer FROM account) cust
                WHERE 
                    date_table.date >= (SELECT MIN(account.date) FROM account)
                    AND
                    date_table.date <= (SELECT MAX(account.date) FROM account)
            ) cust_date
            INNER JOIN 
            account acc 
                ON acc.customer = cust_date.customer AND acc.date <= cust_date.actual_date
        GROUP BY cust_date.actual_date, cust_date.customer
    ) ref_date
    INNER JOIN
    account acc
        ON acc.customer = ref_date.customer AND acc.date = ref_date.reference_date
ORDER BY ref_date.customer, ref_date.actual_date

产生

actual_date  customer  value
-----------  --------  -----
2019-06-27        100     40
2019-06-28        100     30
2019-06-29        100     30
2019-06-30        100     20
2019-07-01        100     10
2019-07-02        100     18
2019-06-21        200    460
2019-06-22        200    460
2019-06-23        200    430
2019-06-24        200    410
2019-06-25        200    130
2019-06-26        200    210
2019-06-27        200    410
2019-06-28        200    310
2019-06-29        200    310
2019-06-30        200    210
2019-07-01        200    110
2019-07-02        200    118