我正在尝试在SQL Server中进行一些报告。 这是基本的表格设置:
订单(ID,DateCreated,状态)
产品(ID,名称,价格)
Order_Product_Mapping (OrderID,ProductID,Quantity,Price,DateOrdered)
在这里,我想创建一个报告,以便在一段时间内对具有相似销售额的产品进行分组:
超过1个月的销售额:
- Coca,Pepsi,Tiger:平均20000美元(可口可乐:21000美元,百事可乐:19000美元,老虎:20000美元)
- 面包,肉:$ 10000 avg(面包:$ 11000,肉:$ 9000)
醇>
请注意,()中的文字只是为了澄清,而不是在报告中。 用户定义可以考虑类似的销售之间的差异。低于5%的示例销售额被认为是相似的,应该组合在一起。时间段也是用户定义的。
我可以计算一段时间内的总销售额,但对于如何按销售额变化将它们组合在一起没有任何想法。我正在使用SQL Server 2012。 任何帮助表示赞赏。
抱歉,我的英语不是很好:)
更新: * 我想出了我真正需要的东西;) *
对于已知的数字数组,如:1,2,3,50,52,100,102,105
我需要将它们分组到至少有3个数字的组中,并且组中任意两个项目之间的差异小于10.
对于上面的数组,输出应为:
[1,2,3]
[100102105]
=>该算法采用3个参数:数组,最小项组成一组和2项之间的最大差异。
如何在C#中实现它?
答案 0 :(得分:1)
我简直不敢相信我做到了~~~
-- this threshold is the key in this query
-- it means that
-- if the difference between two values are less than the threshold
-- these two values are belong to one group
-- in your case, I think it is 200
DECLARE @th int
SET @th = 200
-- very simple, calculate total price for a time range
;WITH totals AS (
SELECT p.name AS col, sum(o.price * op.quantity) AS val
FROM order_product_mapping op
JOIN [order] o ON o.id = op.orderid
JOIN product p ON p.id = op.productid
WHERE dateordered > '2013-03-01' AND dateordered < '2013-04-01'
GROUP BY p.name
),
-- give a row number for each row
cte_rn AS ( --
SELECT col, val, row_number()over(ORDER BY val DESC) rn
FROM totals
),
-- show starts now,
-- firstly, we make each row knows the row before it
cte_last_rn AS (
SELECT col, val, CASE WHEN rn = 1 THEN 1 ELSE rn - 1 END lrn
FROM cte_rn
),
-- then we join current to the row before it, and calculate
-- the difference between the total price of current row and that of previous row
-- if the the difference is more than the threshold we make it '1', otherwise '0'
cte_range AS (
SELECT
c1.col, c1.val,
CASE
WHEN c2.val - c1.val <= @th THEN 0
ELSE 1
END AS range,
rn
FROM cte_last_rn c1
JOIN cte_rn c2 ON lrn = rn
),
-- even tricker here,
-- now, we join last cte to itself, and for each row
-- sum all the values (0, 1 that calculated previously) of rows before current row
cte_rank AS (
SELECT c1.col, c1.val, sum(c2.range) rank
FROM cte_range c1
JOIN cte_range c2 ON c1.rn >= c2.rn
GROUP BY c1.col, c1.val
)
-- now we have properly grouped theres total prices, and we can group on it's rank
SELECT
avg(c1.val) AVG,
(
SELECT c2.col + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.val desc
FOR xml path('')
) product,
(
SELECT cast(c2.val AS nvarchar(MAX)) + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.desc
FOR xml path('')
) price
FROM cte_rank c1
GROUP BY c1.rank
HAVING count(1) > 2
结果如下:
AVG PRODUCT PRICE
28 A, B, C 30, 29, 27
12 D, E, F 15, 12, 10
3 G, H, I 4, 3, 2
为了解我如何连接,请阅读: Concatenate many rows into a single text string?
答案 1 :(得分:1)
顺便说一下,如果你只想要c#:
var maxDifference = 10;
var minItems = 3;
// I just assume your list is not ordered, so order it first
var array = (new List<int> {3, 2, 50, 1, 51, 100, 105, 102}).OrderBy(a => a);
var result = new List<List<int>>();
var group = new List<int>();
var lastNum = array.First();
var totalDiff = 0;
foreach (var n in array)
{
totalDiff += n - lastNum;
// if distance of current number and first number in current group
// is less than the threshold, add into current group
if (totalDiff <= maxDifference)
{
group.Add(n);
lastNum = n;
continue;
}
// if current group has 3 items or more, add to final result
if (group.Count >= minItems)
result.Add(group);
// start new group
group = new List<int>() { n };
lastNum = n;
totalDiff = 0;
}
// forgot the last group...
if (group.Count >= minItems)
Result.Add(group);
这里的关键是,数组需要排序,这样你就不需要跳转或存储值来计算距离
答案 2 :(得分:0)
此查询应生成您期望的内容,它会显示您订购的每个月的产品销售额:
SELECT CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) As Month ,
Product.Name ,
AVG( OP.Quantity * OP.Price ) As Turnover
FROM Order_Product_Mapping OP
INNER JOIN Product ON Product.ID = OP.ProductID
GROUP BY CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) ,
Product.Name
未经测试,但如果您提供样本数据,我可以使用它
答案 3 :(得分:0)
看起来我让事情变得更加复杂。 以下是应该解决问题的方法:
- 运行查询以获取每种产品的销售额。
-Run K-mean或一些类似的算法。