我有一个更复杂的问题,我试图将其隔离。
我有一个简单的查询,它返回所有不同的客户电子邮件(因此每个客户)
Select distinct
CustomerEmail
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock) --I don't think the tables are relevant to the problem.
LEFT JOIN (SELECT
ID.Company_Code
,ID.Division_Code
,ID.Invoice_Number
,SUM (ID.Price* ID.Quantity) Total
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Detail ID (nolock)
GROUP BY ID.Company_Code, ID.Division_Code, ID.Invoice_Number) ID
ON I.Company_Code = ID.Company_Code
AND I.Division_Code = ID.Division_Code
AND I.Invoice_Number = ID.Invoice_Number
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPHIST SH (nolock) ON I.Pickticket_Number = SH.Packslip
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].[SpraygroundMagentoCustomerEmailData] S on SH.CUST_PO = S.InvoiceNumber
Where I.Company_Code ='09' AND I.Division_Code = '001'
AND I.Customer_Number = 'ECOM2X'
AND ISNUMERIC(SH.CUST_PO) <> 0
AND I.Date_Created BETWEEN DATEADD(month, -0, '6/1/2016') AND '1/1/2017' -- Orders Base default is 12 months, options are 6,12, 18, and 24
这将返回19,516行。
但是,如果我在查询中添加了第二条简单的select语句,
Select distinct
Month(I.Date_Created) Month,
CustomerEmail
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock)...
现在返回20,452行。
把这个问题写出来,我想我理解这个问题。它将重复不同月份的电子邮件。因此,如果客户在6月和7月下订单,则他的电子邮件将出现两次,第6个月一次,第7个月一次。
所以这个数字应该比19,516数字更正确,对吗?
稍后在更复杂的查询中,我计算TotalCustomers Number的方法是一个简单的Dense Rank语句
,DENSE_RANK() over (order by CustomerEmail asc)
+DENSE_RANK() over (order by CustomerEmail desc)
- 1 as TotalCustomersOverRange
这将返回19,516,因为它不计算多次购买。但这在技术上也是正确的,因为在该日期范围内,唯一身份客户较少。只有按月细分时,您才会获得真正相同的重复客户。
解决此问题的最佳方法是什么?这是我的完整查询:
--Calculate average amount of time between purchase
--Calculate percentage of quantity and total increase with each purchase.
--Return most valued customers.
--User defined base range
-- later on, more refined user defined customer base, so if the base range is 18 months and the customer range is 1 month, it will only check the data against customers that purchased orders within the last month.
-- over the customer range, we define who the customers are. We call this RANGE
-- over the orderes base range, we define and how many times they ordered. We call this BASE.
-- First we filter by month, returning total new orders and total recurring orders
-- (FOR OTHER REPORT, filter by state and not month)
-- Then within the month, we drill down to calculate how many customers are one orders, two orders, three orders, etc total
-- For each order amount, we calculate average days between orders, total value, lifetime value, and quantity changes
SELECT DISTINCT --*
Month
,(DENSE_RANK() over (partition by Month order by CustomerEmail asc)
+DENSE_RANK() over (partition by Month order by CustomerEmail desc))
-1 as TotalCustomersThisMonth
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and RangeOrderNumber = 1 then 1 else 0 end) over (partition by Month) NewCustomersOverRangeThisMonth --Some of those customers aren't really new, if we expand to the base.
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer = 1 then 1 else 0 end) over (partition by Month) NewCustomersOverBaseThisMonth
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer > 1 then 1 else 0 end) over (partition by Month) RecurringCustomersOverBaseButNewInRangeThisMonth -- Customers in Base who are not in range.
,Sum(Case When AmountOrdersOverRangeByCustomer > 1 and RangeOrderNumber =1 then 1 else 0 end) over (partition by Month) RecurringCustomerOverRangeThisMonth
,TTT.NewCustomersOverRange
,TTT.NewCustomersOverBase
,TTT.RecurringCustomersOverBaseButNewInRange
,TTT.RecurringCustomerOverRange
,TTT.TotalCustomersOverBase
,TTT.TotalCustomersOverRange
FROM --This table calculates new and recurring customers.
(
SELECT
*
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and RangeOrderNumber = 1 then 1 else 0 end) over () NewCustomersOverRange --Some of those customers aren't really new, if we expand to the base.
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer = 1 then 1 else 0 end) over () NewCustomersOverBase
,Sum(Case When AmountOrdersOverRangeByCustomer = 1 and AmountOrdersOverBaseByCustomer > 1 then 1 else 0 end) over () RecurringCustomersOverBaseButNewInRange -- Customers in Base who are not in range
,Sum(Case When AmountOrdersOverRangeByCustomer > 1 and RangeOrderNumber =1 then 1 else 0 end) over () RecurringCustomerOverRange
FROM -- This table gives you Order Numbers Per Customer
(
SELECT
*
,ROW_NUMBER() over (partition by CustomerEmail order by Date_Created asc) RangeOrderNumber
,(DENSE_RANK() over (partition by CustomerEmail order by Date_Created asc)
+DENSE_RANK() over (partition by CustomerEmail order by Date_Created desc))
-1 as AmountOrdersOverRangeByCustomer
,Max(BaseOrderNumber) over (partition by CustomerEmail) AmountOrdersOverBaseByCustomer
,DENSE_RANK() over (order by CustomerEmail asc)
+DENSE_RANK() over (order by CustomerEmail desc)
- 1 as TotalCustomersOverRange
FROM --This table gives you a line by line basis of every order
(
Select
I.Date_Created
,I.Company_Code
,I.Division_Code
,I.Invoice_Number
,Sh.CUST_PO
,I.Total_Quantity
,ID.Total
,SH.Ship_City City
,CASE WHEN SH.Ship_Cntry <> 'US' THEN 'INT' ELSE SH.Ship_prov END State
,SH.Ship_Zip Zip
,SH.Ship_Cntry Country
,Month(I.Date_Created) Month
,S.CustomerEmail
,Count(*) over (partition by CustomerEmail order by Date_Created asc) BaseOrderNumber
,dense_rank() over (order by CustomerEmail)
+ dense_rank() over (order by CustomerEmail desc)
- 1 as TotalCustomersOverBase
--,Count(Distinct CustomerEmail) over () as TotalCustomersOverBase
--,ROW_NUMBER() over (partition by S.CustomerEmail order by Date_Created asc) PurchaseCount --this goes somewhere else
--,(DENSE_RANK() over (partition by Month(I.Date_Created) order by CustomerEmail asc)
--+DENSE_RANK() over (partition by Month(I.Date_Created) order by CustomerEmail desc))
---1 as TotalCustomersThisMonth
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I (nolock)
LEFT JOIN (SELECT
ID.Company_Code
,ID.Division_Code
,ID.Invoice_Number
,SUM (ID.Price* ID.Quantity) Total
FROM [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Detail ID (nolock)
GROUP BY ID.Company_Code, ID.Division_Code, ID.Invoice_Number) ID
ON I.Company_Code = ID.Company_Code
AND I.Division_Code = ID.Division_Code
AND I.Invoice_Number = ID.Invoice_Number
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPHIST SH (nolock) ON I.Pickticket_Number = SH.Packslip
LEFT JOIN
[JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].[SpraygroundMagentoCustomerEmailData] S on SH.CUST_PO = S.InvoiceNumber
Where I.Company_Code ='09' AND I.Division_Code = '001'
AND I.Customer_Number = 'ECOM2X'
AND ISNUMERIC(SH.CUST_PO) <> 0
AND I.Date_Created BETWEEN DATEADD(month, -12, '1/1/2017') AND '12/31/2016' -- Orders Base default is 12 months, options are 6,12, 18, and 24
--AND CustomerEmail is NULL
)T
Where T.Date_Created BETWEEN '6/1/2016' AND '1/1/2017'-- Customer Range
)TT
--
--Order By CustomerEmail, RangeOrderNumber asc
--
)TTT
--Order By Date_Created desc
--Order By CustomerEmail, RangeOrderNumber asc
Order By Month
答案 0 :(得分:2)
使用DISTINCT
运算符时,返回的数据集将仅包含与众不同行。这意味着,如果2个或更多行的每一列包含相同的值,则该行将仅返回一次。
在第一条语句中,您仅显示CustomerEmail
的值,因此您将获得CustomerEmail
的每个唯一/唯一值的数据集。
在第二个语句中,您有Month(I.Date_Created)
和CustomerEmail
,因此您将为组合这两个值的每个唯一/唯一值得到一行。这意味着,在您的第一个数据集中,该数据集多于1个行,并为CustomerEmail
指定了一个特定的值,当您添加Month(I.Date_Created)
时,该表达式至少有2个不同的值。
简单地说,请看下面的语句:
WITH N AS(
SELECT *
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N (N)),
Tally AS(
SELECT LEFT(NEWID(),1) AS C,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4, N N5)
SELECT DISTINCT C
FROM Tally;
尽管理货表创建了100,000行,您可能只收到了16行的数据集。现在,让我们将这两列都添加到DISTINCT
:
WITH N AS(
SELECT *
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N (N)),
Tally AS(
SELECT LEFT(NEWID(),1) AS C,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3, N N4, N N5)
SELECT DISTINCT C,I
FROM Tally;
现在,尽管基础数据没有改变(尽管它已经生成了NEWID
的新值,但您得到了100,000行),但出于示例的目的,它仍然有效。