为什么要添加一个简单字段以选择更改返回的行数SQL

时间:2019-05-29 14:29:44

标签: sql-server tsql

我有一个更复杂的问题,我试图将其隔离。

我有一个简单的查询,它返回所有不同的客户电子邮件(因此每个客户)

Select distinct 
CustomerEmail

FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock) --I don't think the tables are relevant to the problem. 
                LEFT JOIN (SELECT 
                                ID.Company_Code
                                ,ID.Division_Code
                                ,ID.Invoice_Number
                                ,SUM (ID.Price* ID.Quantity) Total
                            FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Detail ID (nolock)
                            GROUP BY ID.Company_Code, ID.Division_Code, ID.Invoice_Number) ID 
                        ON I.Company_Code = ID.Company_Code
                        AND I.Division_Code = ID.Division_Code
                        AND I.Invoice_Number = ID.Invoice_Number
                LEFT JOIN 
                    [JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPHIST SH (nolock) ON I.Pickticket_Number = SH.Packslip
                LEFT JOIN 
                    [JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].[SpraygroundMagentoCustomerEmailData] S on SH.CUST_PO = S.InvoiceNumber

Where I.Company_Code ='09' AND I.Division_Code = '001'
AND I.Customer_Number = 'ECOM2X'
AND ISNUMERIC(SH.CUST_PO) <> 0 
AND I.Date_Created BETWEEN DATEADD(month, -0, '6/1/2016') AND '1/1/2017'  -- Orders Base default is 12 months, options are 6,12, 18, and 24

这将返回19,516行。

但是,如果我在查询中添加了第二条简单的select语句,

Select distinct 
Month(I.Date_Created) Month,
CustomerEmail
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock)...

现在返回20,452行。

把这个问题写出来,我想我理解这个问题。它将重复不同月份的电子邮件。因此,如果客户在6月和7月下订单,则他的电子邮件将出现两次,第6个月一次,第7个月一次。

所以这个数字应该比19,516数字更正确,对吗?

稍后在更复杂的查询中,我计算TotalCustomers Number的方法是一个简单的Dense Rank语句

,DENSE_RANK() over (order by CustomerEmail asc) 
+DENSE_RANK() over (order by CustomerEmail desc) 
- 1 as TotalCustomersOverRange

这将返回19,516,因为它不计算多次购买。但这在技术上也是正确的,因为在该日期范围内,唯一身份客户较少。只有按月细分时,您才会获得真正相同的重复客户。

解决此问题的最佳方法是什么?这是我的完整查询:

--Calculate average amount of time between purchase
--Calculate percentage of quantity and total increase with each purchase.
--Return most valued customers.  
--User defined base range
-- later on, more refined user defined customer base, so if the base range is 18 months and the customer range is 1 month, it will only check the data against customers that purchased orders within the last month. 
-- over the customer range, we define who the customers are. We call this RANGE
-- over the orderes base range,  we define and how many times they ordered. We call this BASE.
-- First we filter by month, returning total new orders and total recurring orders
-- (FOR OTHER REPORT, filter by state and not month)
-- Then within the month, we drill down to calculate how many customers are one orders, two orders, three orders, etc total
-- For each order amount, we calculate average days between orders, total value, lifetime value, and quantity changes


SELECT DISTINCT --*
Month

,(DENSE_RANK() over (partition by Month order by CustomerEmail asc)
    +DENSE_RANK() over (partition by Month order by CustomerEmail desc))
    -1 as TotalCustomersThisMonth

,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and RangeOrderNumber = 1 then 1 else 0 end) over (partition by Month) NewCustomersOverRangeThisMonth --Some of those customers aren't really new, if we expand to the base.
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer = 1 then 1 else 0 end) over (partition by Month) NewCustomersOverBaseThisMonth
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer > 1 then 1 else 0 end)  over (partition by Month) RecurringCustomersOverBaseButNewInRangeThisMonth -- Customers in Base who are not in range.
,Sum(Case When AmountOrdersOverRangeByCustomer > 1 and RangeOrderNumber =1 then 1 else 0 end)  over (partition by Month) RecurringCustomerOverRangeThisMonth
,TTT.NewCustomersOverRange
,TTT.NewCustomersOverBase
,TTT.RecurringCustomersOverBaseButNewInRange
,TTT.RecurringCustomerOverRange
,TTT.TotalCustomersOverBase
,TTT.TotalCustomersOverRange

FROM --This table calculates new and recurring customers.
(
    SELECT  
    *
    ,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and RangeOrderNumber = 1 then 1 else 0 end) over () NewCustomersOverRange --Some of those customers aren't really new, if we expand to the base.
    ,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer = 1 then 1 else 0 end) over () NewCustomersOverBase 
    ,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer > 1 then 1 else 0 end)  over () RecurringCustomersOverBaseButNewInRange -- Customers in Base who are not in range
    ,Sum(Case When AmountOrdersOverRangeByCustomer > 1 and RangeOrderNumber =1 then 1 else 0 end)  over () RecurringCustomerOverRange

    FROM -- This table gives you Order Numbers Per Customer
    (
        SELECT 
            *
            ,ROW_NUMBER() over (partition by CustomerEmail order by Date_Created asc) RangeOrderNumber

            ,(DENSE_RANK() over (partition by CustomerEmail order by Date_Created asc)
            +DENSE_RANK() over (partition by CustomerEmail order by Date_Created desc))
            -1 as AmountOrdersOverRangeByCustomer

            ,Max(BaseOrderNumber) over (partition by CustomerEmail) AmountOrdersOverBaseByCustomer

            ,DENSE_RANK() over (order by CustomerEmail asc) 
            +DENSE_RANK() over (order by CustomerEmail desc) 
            - 1 as TotalCustomersOverRange

        FROM --This table gives you a line by line basis of every order
        (
            Select 
             I.Date_Created
            ,I.Company_Code
            ,I.Division_Code
            ,I.Invoice_Number
            ,Sh.CUST_PO
            ,I.Total_Quantity
            ,ID.Total
            ,SH.Ship_City City
            ,CASE WHEN SH.Ship_Cntry <> 'US' THEN 'INT' ELSE SH.Ship_prov END State
            ,SH.Ship_Zip Zip
            ,SH.Ship_Cntry Country
            ,Month(I.Date_Created) Month
            ,S.CustomerEmail
            ,Count(*) over (partition by CustomerEmail order by Date_Created asc) BaseOrderNumber

            ,dense_rank() over (order by CustomerEmail) 
            + dense_rank() over (order by CustomerEmail desc) 
            - 1 as TotalCustomersOverBase
            --,Count(Distinct CustomerEmail) over () as TotalCustomersOverBase
            --,ROW_NUMBER() over (partition by S.CustomerEmail order by Date_Created asc) PurchaseCount --this goes somewhere else

            --,(DENSE_RANK() over (partition by Month(I.Date_Created) order by CustomerEmail asc)
            --+DENSE_RANK() over (partition by Month(I.Date_Created) order by CustomerEmail desc))
            ---1 as TotalCustomersThisMonth

            FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock)
                LEFT JOIN (SELECT 
                                ID.Company_Code
                                ,ID.Division_Code
                                ,ID.Invoice_Number
                                ,SUM (ID.Price* ID.Quantity) Total
                            FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Detail ID (nolock)
                            GROUP BY ID.Company_Code, ID.Division_Code, ID.Invoice_Number) ID 
                        ON I.Company_Code = ID.Company_Code
                        AND I.Division_Code = ID.Division_Code
                        AND I.Invoice_Number = ID.Invoice_Number
                LEFT JOIN 
                    [JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPHIST SH (nolock) ON I.Pickticket_Number = SH.Packslip
                LEFT JOIN 
                    [JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].[SpraygroundMagentoCustomerEmailData] S on SH.CUST_PO = S.InvoiceNumber

            Where I.Company_Code ='09' AND I.Division_Code = '001'
            AND I.Customer_Number = 'ECOM2X'
            AND ISNUMERIC(SH.CUST_PO) <> 0 
            AND I.Date_Created BETWEEN DATEADD(month, -12, '1/1/2017') AND '12/31/2016'  -- Orders Base default is 12 months, options are 6,12, 18, and 24
            --AND CustomerEmail is NULL
        )T
        Where  T.Date_Created BETWEEN '6/1/2016' AND '1/1/2017'-- Customer Range
    )TT

    --
    --Order By CustomerEmail, RangeOrderNumber asc 
    --

)TTT

--Order By Date_Created desc 
--Order By CustomerEmail, RangeOrderNumber asc 
Order By Month 

1 个答案:

答案 0 :(得分:2)

使用DISTINCT运算符时,返回的数据集将仅包含与众不同行。这意味着,如果2个或更多行的每一列包含相同的值,则该行将仅返回一次。

在第一条语句中,您仅显示CustomerEmail的值,因此您将获得CustomerEmail的每个唯一/唯一值的数据集。

在第二个语句中,您有Month(I.Date_Created)CustomerEmail,因此您将为组合这两个值的每个唯一/唯一值得到一行。这意味着,在您的第一个数据集中,该数据集多于1个行,并为CustomerEmail指定了一个特定的值,当您添加Month(I.Date_Created)时,该表达式至少有2个不同的值。

简单地说,请看下面的语句:

WITH N AS(
    SELECT *
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N (N)),
Tally AS(
    SELECT LEFT(NEWID(),1) AS C,
           ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2, N N3, N N4, N N5)
SELECT DISTINCT C
FROM Tally;

尽管理货表创建了100,000行,您可能只收到了16行的数据集。现在,让我们将这两列都添加到DISTINCT

WITH N AS(
    SELECT *
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N (N)),
Tally AS(
    SELECT LEFT(NEWID(),1) AS C,
           ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2, N N3, N N4, N N5)
SELECT DISTINCT C,I
FROM Tally;

现在,尽管基础数据没有改变(尽管它已经生成了NEWID的新值,但您得到了100,000行),但出于示例的目的,它仍然有效。