考虑下表和伪查询:
Distinct Customers
WHERE
Most common PaymentMethod = 'CreditCard'
AND
Most common DeliveryService = '24hr'
Customer TransID PaymentMethod DeliveryService
-----------------------------------------------------
Susan 1 CreditCard 24hr
Susan 2 CreditCard 24hr
Susan 3 Cash 24hr
John 4 CreditCard 48hr
John 5 CreditCard 48hr
Diane 6 CreditCard 24hr
Steve 7 Paypal 24hr
Steve 8 CreditCard 48hr
Steve 9 Paypal 24hr
Should return (2) records:
Customer
---------
Susan
Diane
另一种看待它的方法是我想排除少数民族案件,即: 我不想归还'史蒂夫',因为虽然他曾经使用过一次信用卡,但他一般不会这样做,我只关心大多数行为,跨多个列。
实际上,有更多的列(10s)需要应用相同的原则,因此我采用的技术至少可以扩展到搜索100k的记录。
答案 0 :(得分:1)
CREATE TABLE #D
(
Customer VARCHAR(50), TransID INT, PaymentMethod VARCHAR(50), DeliveryService VARCHAR(50)
)
INSERT INTO #D VALUES
('Susan',1,'CreditCard','24hr'),
('Susan',2,'CreditCard','24hr'),
('Susan',3,'Cash','24hr'),
('John ',4,'CreditCard','48hr'),
('John ',5,'CreditCard','48hr'),
('Diane',6,'CreditCard','24hr'),
('Steve',7,'Paypal','24hr'),
('Steve',8,'CreditCard','48hr'),
('Steve',9,'Paypal','24hr')
;with cte as
(
SELECT *,row_number() OVER (PARTITION BY PaymentMethod,Customer ORDER BY TransID) AS RN FROM #D
)
select DISTINCT Customer FROM cte where PaymentMethod = 'CreditCard'
AND DeliveryService = '24hr' and rn>1
答案 1 :(得分:1)
试试这个..
CREATE TABLE #TEMP (Customer varchar(20),TransID INT, PaymentMethod varchar(20),DeliveryService VARCHAR(10))
INSERT INTO #TEMP VALUES
('Susan',1,'CreditCard','24hr'),
('Susan',2,'CreditCard','24hr'),
('Susan',3,'Cash','24hr'),
('John',4,'CreditCard','48hr'),
('John',5,'CreditCard','48hr'),
('Diane',6,'CreditCard','24hr'),
('Steve',7,'Paypal','24hr'),
('Steve',8,'CreditCard','48hr'),
('Steve',9,'Paypal','24hr');
SELECT DISTINCT Customer FROM (
SELECT ROW_NUMBER () OVER (PARTITION BY PaymentMethod,Customer ORDER BY Customer) AS RNPaymentMethod,
ROW_NUMBER () OVER (PARTITION BY DeliveryService,Customer ORDER BY Customer) AS RNDeliveryService,Customer,TransID,PaymentMethod,DeliveryService FROM #TEMP) X
WHERE X.PaymentMethod = 'CreditCard' AND X.DeliveryService = '24hr' AND X.RNPaymentMethod=1 AND X.RNDeliveryService=1
PS:我还为送货服务保留了额外的行号,因为您提到我们需要查看多列中的多数行为。
希望这有帮助!
答案 2 :(得分:1)
好的,让我试试。
根据这个问题,您需要知道最常见的事件,我认为您必须声明一个完全返回此函数的函数:
对于这个例子,我使用了临时表的相同值,但是我创建了一个永久表,如果没有,我就无法创建和测试这些函数。我真的相信这些功能可以优化,但我没有时间做更多。
使用功能,您可以修改公式,并使其符合您的标准。
create function most_common_payment(@customer varchar(100))
returns varchar(100)
as
begin
declare @total int, @payment varchar(100), @max_times int
-- total records
select @total = COUNT(*) from tempD where Customer=@customer;
if @total = 0 return ''
-- max ocurrences payment method
select top 1 @payment = PaymentMethod, @max_times = count(*)
from tempd
where Customer = @customer
group by Customer, PaymentMethod
order by COUNT(*) desc;
if @max_times <= 1 return '';
-- percentatge
if ((@max_times * 100) / @total) < 50 set @payment = '';
return @payment;
end
go
和DeliveryService相同
crate function most_common_delivery(@customer varchar(100))
returns varchar(100)
as
begin
declare @total int, @delivery varchar(100), @max_times int
-- total records
select @total = COUNT(*) from tempD where Customer=@customer;
if @total = 0 return ''
-- max ocurrences payment method
select top 1 @delivery = DeliveryService, @max_times = count(*)
from tempd
where Customer = @customer
group by Customer, DeliveryService
order by COUNT(*) desc;
if @max_times <= 1 return '';
-- percentatge
if ((@max_times * 100) / @total) < 50 set @delivery = '';
return @delivery;
end
好的,现在我可以查询所需的结果:
select distinct
Customer
,dbo.most_common_payment(tempd.Customer) as MostCommonPayment
,dbo.most_common_delivery(tempd.Customer) as MostCommonDelivery
from
tempd
where
dbo.most_common_payment(tempd.Customer) = 'CreditCard'
and dbo.most_common_delivery(tempd.Customer) = '24hr'
这就是结果:
Customer MostCommonPayment MostCommonDelivery
-------- ----------------- ------------------
Susan CreditCard 24hr
没有过滤器
Customer MostCommonPayment MostCommonDelivery
-------- ----------------- ------------------
Diane
John CreditCard 48hr
Steve Paypal 24hr
Susan CreditCard 24hr
答案 3 :(得分:1)
一种方法使用窗口函数和聚合:
with cp as (
select customerid, paymentmethod, count(*) as cnt,
rank() over (partition by customerid order by count(*) desc) as seqnum
from t
group by customerid, paymentmethod
),
cd as (
select customerid, deliveryservice, count(*) as cnt
rank() over (partition by customerid over by count(*) desc) as seqnum
from t
group by customerid, deliveryservice
)
select cp.customerid
from cp join
cd
on cp.customerid = cd.customerid
where (cp.seqnum = 1 and cp.PaymentMethod = 'CreditCard') and
(cd.seqnum = 1 and cd.DeliveryService = '24hr');
因为您需要两个不同维度的排名,我认为您需要两个子查询(或等效的)。