Question

我正在尝试在SQL Server中进行一些报告。这是基本的表格设置：

订单（ID，DateCreated，状态）

产品（ID，名称，价格）

Order_Product_Mapping （OrderID，ProductID，Quantity，Price，DateOrdered）

在这里，我想创建一个报告，以便在一段时间内对具有相似销售额的产品进行分组：

超过1个月的销售额：

Coca，Pepsi，Tiger：平均20000美元（可口可乐：21000美元，百事可乐：19000美元，老虎：20000美元）

面包，肉：$ 10000 avg（面包：$ 11000，肉：$ 9000）

请注意，（）中的文字只是为了澄清，而不是在报告中。用户定义可以考虑类似的销售之间的差异。低于5％的示例销售额被认为是相似的，应该组合在一起。时间段也是用户定义的。

我可以计算一段时间内的总销售额，但对于如何按销售额变化将它们组合在一起没有任何想法。我正在使用SQL Server 2012。任何帮助表示赞赏。

抱歉，我的英语不是很好:)

更新： * 我想出了我真正需要的东西;） *

对于已知的数字数组，如：1,2,3,50,52,100,102,105

我需要将它们分组到至少有3个数字的组中，并且组中任意两个项目之间的差异小于10.

对于上面的数组，输出应为：

[1,2,3]

[100102105]

=＆GT;该算法采用3个参数：数组，最小项组成一组和2项之间的最大差异。

如何在C＃中实现它？

Answer 1

我简直不敢相信我做到了~~~

-- this threshold is the key in this query
-- it means that 
-- if the difference between two values are less than the threshold
-- these two values are belong to one group
-- in your case, I think it is 200
DECLARE @th int
SET @th = 200

-- very simple, calculate total price for a time range
;WITH totals AS ( 
  SELECT p.name AS col, sum(o.price * op.quantity) AS val
  FROM order_product_mapping op
  JOIN [order] o ON o.id = op.orderid
  JOIN product p ON p.id = op.productid
  WHERE dateordered > '2013-03-01' AND dateordered < '2013-04-01'
  GROUP BY p.name
),
-- give a row number for each row
cte_rn AS ( -- 
  SELECT col, val, row_number()over(ORDER BY val DESC) rn
  FROM totals
),
-- show starts now,
-- firstly, we make each row knows the row before it 
cte_last_rn AS (
  SELECT col, val, CASE WHEN rn = 1 THEN 1 ELSE rn - 1 END lrn
  FROM cte_rn
),
-- then we join current to the row before it, and calculate 
-- the difference between the total price of current row and that of previous row
-- if the the difference is more than the threshold we make it '1', otherwise '0'
cte_range AS (
  SELECT
    c1.col, c1.val,
    CASE
      WHEN c2.val - c1.val <= @th THEN 0
      ELSE 1
    END AS range,
    rn
  FROM cte_last_rn c1
  JOIN cte_rn c2 ON lrn = rn
),
-- even tricker here,
-- now, we join last cte to itself, and for each row
-- sum all the values (0, 1 that calculated previously) of rows before current row
cte_rank AS (
  SELECT c1.col, c1.val, sum(c2.range) rank
  FROM cte_range c1
  JOIN cte_range c2 ON c1.rn >= c2.rn
  GROUP BY c1.col, c1.val
)
-- now we have properly grouped theres total prices, and we can group on it's rank 
SELECT 
  avg(c1.val) AVG,
  (
    SELECT c2.col + ', ' AS 'data()'
    FROM cte_rank c2
    WHERE c2.rank = c1.rank
    ORDER BY c2.val desc
    FOR xml path('')
  ) product,
  (
    SELECT cast(c2.val AS nvarchar(MAX)) + ', ' AS 'data()'
    FROM cte_rank c2
    WHERE c2.rank = c1.rank
    ORDER BY c2.desc
    FOR xml path('')
  ) price
FROM cte_rank c1
GROUP BY c1.rank
HAVING count(1) > 2

结果如下：

AVG     PRODUCT     PRICE
28      A, B, C     30, 29, 27
12      D, E, F     15, 12, 10
3       G, H, I     4, 3, 2

为了解我如何连接，请阅读： Concatenate many rows into a single text string?

Answer 2

顺便说一下，如果你只想要c＃：

var maxDifference = 10;
var minItems = 3;     

// I just assume your list is not ordered, so order it first
var array = (new List<int> {3, 2, 50, 1, 51, 100, 105, 102}).OrderBy(a => a);

var result = new List<List<int>>();
var group = new List<int>();
var lastNum = array.First();
var totalDiff = 0;
foreach (var n in array)
{
    totalDiff += n - lastNum;

    // if distance of current number and first number in current group
    // is less than the threshold, add into current group
    if (totalDiff <= maxDifference)
    {
        group.Add(n); 
        lastNum = n;
        continue;
    }

    // if current group has 3 items or more, add to final result
    if (group.Count >= minItems)
        result.Add(group);

    // start new group
    group = new List<int>() { n };
    lastNum = n;
    totalDiff = 0;   
}

// forgot the last group...
if (group.Count >= minItems)
    Result.Add(group);

这里的关键是，数组需要排序，这样你就不需要跳转或存储值来计算距离

Answer 3

此查询应生成您期望的内容，它会显示您订购的每个月的产品销售额：

SELECT CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) As Month , 
Product.Name , 
AVG( OP.Quantity * OP.Price ) As Turnover
FROM Order_Product_Mapping OP
INNER JOIN Product ON Product.ID = OP.ProductID
GROUP BY  CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) ,
          Product.Name

未经测试，但如果您提供样本数据，我可以使用它

Answer 4

看起来我让事情变得更加复杂。以下是应该解决问题的方法：

- 运行查询以获取每种产品的销售额。

-Run K-mean或一些类似的算法。

SQL Server：将类似的销售组合在一起

4 个答案: