我的交易数据有很多列,但通常结构如下:
Basket_ID Product_ID
basket1 product1
basket1 product2
basket1 product3
basket2 product1
basket2 product1
每个购物篮都有很多行,其中basket_ID为键。产品的每个实例在购物篮中都有自己的生产线,因此同一项目中的两个仍将超过两个生产线。购物篮数据还包含付款类型,促销明细等,但是可以通过过滤sales_quantity = 1来消除。
通过产品ID,我想得出一个产品是购物篮中唯一物品的购物篮总数,并希望将购买该产品作为购物篮中唯一物品的购买次数百分比。也就是说,product1在所有购物篮中销售100次,这是5个购物篮中唯一的商品,因此为5%。
我有一些似乎可以正确返回计数的代码,但是我在%部分中苦苦挣扎。它也不太理想,所以我敢肯定必须有一种更有效的方法。
这似乎很有效(但很混乱),以返回按product_id分组的购物篮计数,其中产品是购物篮中唯一的商品:
drop table if exists #tempbasket
--unique products basket size distribution
select report_transaction_ID
,count(product_id) as uniq_prods
into
#tempbasket
from
(
select
report_transaction_ID
,product_id
FROM Transactions
WHERE CONVERT(DATE, TRANSACTION_DATE) BETWEEN '2019-02-04' and '2019-04-04'
AND basket_id is not null
and PRODUCT_ID is not null
AND sales_quantity = 1) q1
group by REPORT_TRANSACTION_ID
having count(PRODUCT_ID) = 1
select
product_id
,count(q1.report_transaction_ID) as num_single_item_baskets
FROM
(
select
report_transaction_ID
,product_id
FROM Transactions
WHERE CONVERT(DATE, TRANSACTION_DATE) BETWEEN '2019-02-04' and '2019-04-04'
AND basket_id is not null
and PRODUCT_ID is not null
AND sales_quantity = 1) q1
inner join #tempbasket t2 on q1.REPORT_TRANSACTION_ID = t2.REPORT_TRANSACTION_ID
where uniq_prods = 1
group by product_id
order by count(q1.report_transaction_ID) desc
然后我什至更麻烦的尝试来解决%...:
select
q1.product_id
,count(q1.report_transaction_ID) as num_single_item_baskets
,count(q2.report_transaction_ID) as total_baskets
,(count(q1.report_transaction_ID)*1.00)/(count(q2.report_transaction_ID)*1.00) as pct_single_item_baskets
FROM
(
select
report_transaction_ID
,product_id
FROM Transactions
WHERE CONVERT(DATE, TRANSACTION_DATE) BETWEEN '2019-02-04' and '2019-02-04'
AND basket_id is not null
and PRODUCT_ID is not null
AND sales_quantity = 1) q1
inner join #tempbasket t2 on q1.REPORT_TRANSACTION_ID = t2.REPORT_TRANSACTION_ID
inner join
(
select
report_transaction_ID
,product_id
FROM Transactions
WHERE CONVERT(DATE, TRANSACTION_DATE) BETWEEN '2019-02-04' and '2019-02-04'
AND basket_id is not null
and PRODUCT_ID is not null
AND sales_quantity = 1) q2 on q1.PRODUCT_ID = q2.product_id
group by q1.product_id
order by count(q1.report_transaction_ID) desc
在30分钟后仍然运行第二个查询,因此不确定返回什么。第一个查询会运行,但要花一点时间,但结果似乎与预期的相对。
任何帮助表示赞赏,我相信还有比这更好的方法!
答案 0 :(得分:0)
嗯。 。 。您可以使用两个聚合级别来回答您的主要问题:
select count(*)
from (select Basket_ID
from transactions
group by Bascket_ID
having min(Product_ID) = max(Product_ID)
) b