Question

作为一个简化示例，我需要选择客户的送货地址与之前送货地址不同的每个实例。所以我有一个包含以下列的大表：

purchase_id | cust_id | date | address  | description
-----------------------------------------------------------
 1          | 5       | jan  | address1 | desc1
 2          | 6       | jan  | address2 | desc2
 3          | 5       | feb  | address1 | desc3
 4          | 6       | feb  | address2 | desc4
 5          | 5       | mar  | address3 | desc5
 6          | 5       | mar  | address3 | desc6
 7          | 5       | apr  | address1 | desc7
 8          | 6       | may  | address4 | desc8

请注意，客户可以退回＆＃34;到客户5在第7行中执行的先前地址。

我想要选择的（并且尽可能高效，因为这是一张非常大的表格）是每个＆＃39;块的第一行。其中客户将后续订单发送到同一地址。在此示例中，将是行1,2,5,7和8.在所有其他示例中，客户具有与其先前订单相同的地址。

因此，我希望先ORDER BY (cust_id, date)，然后SELECT purchase_id, cust_id, min(date), address, description。

但是我遇到了麻烦，因为SQL通常需要在GROUP BY之前完成ORDER BY。因此，我无法弄清楚如何适应，例如this question的顶部答案之一（我非常喜欢。）在分组或使用min()之类的聚合函数之前，必须（在概念上，至少）按日期排序，否则我会错过实例就像我的示例表中的第7行一样，客户已将其移回＆＃39;到以前的地址。

另请注意，两位客户可以共享一个地址，因此在按日期排序后，我需要按cust_id和address进行有效分组。

我使用雪花，我相信它具有与PostgreSQL和SQL Server的最新版本相同的大多数命令（虽然我对雪花很新，所以不完全确定。）

Answer 1

您可以使用row_number窗口功能来实现这一目的：

;with cte as(select *, row_number() over(partition by cust_id, address
                                         order by purchase_id) as rn from table)
select * from cte 
where rn = 1

Answer 2

这可能最好通过子查询来解决，以便为每个用户进行首次购买，然后使用IN根据该结果过滤行。

澄清一下，purchase_id是一个自动增量列，对吗？如果是这样，则必须在以后创建具有较高purchase_id的购买，并且以下内容应该足够：

SELECT *
FROM purchases
WHERE purchase_id IN (
  SELECT MIN(purchase_id) AS first_purchase_id
  FROM purchases
  GROUP BY cust_id
)

如果您只想为具有多个地址的客户进行首次购买，请在子查询中添加HAVING子句：

SELECT *
FROM purchases
WHERE purchase_id IN (
  SELECT MIN(purchase_id) AS first_purchase_id
  FROM purchases
  GROUP BY cust_id
  HAVING COUNT(DISTINCT address) > 1
)

小提琴：http://sqlfiddle.com/#!9/12d75/6

但是，如果purchase_id不是自动增量列，那么子查询上的SELECT和cust_id上都会min(date)，并INNER JOIN cust_id min(date) 1}}和SELECT * FROM purchases INNER JOIN ( SELECT cust_id, MIN(date) AS min_date FROM purchases GROUP BY cust_id HAVING COUNT(DISTINCT address) > 1 ) cust_purchase_date ON purchases.cust_id = cust_purchase_date.cust_id AND purchases.date = cust_purchase_date.min_date：

purchase_id

然而，第一个查询示例可能会更快，所以如果public class ImageAdapter extends BaseAdapter { int[] images = { R.mipmap.blackrook, R.mipmap.blackknight, R.mipmap.blackbishop, R.mipmap.blackqueen, R.mipmap.blackking, R.mipmap.blackbishop, R.mipmap.blackknight, R.mipmap.blackrook, R.mipmap.blackpawn, R.mipmap.blackpawn, R.mipmap.blackpawn, R.mipmap.blackpawn, R.mipmap.blackpawn, R.mipmap.blackpawn, R.mipmap.blackpawn, R.mipmap.blackpawn, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.blacksquare, R.mipmap.whitesquare, R.mipmap.whitepawn, R.mipmap.whitepawn, R.mipmap.whitepawn, R.mipmap.whitepawn, R.mipmap.whitepawn, R.mipmap.whitepawn, R.mipmap.whitepawn, R.mipmap.whitepawn, R.mipmap.whiterook, R.mipmap.whiteknight, R.mipmap.whitebishop, R.mipmap.whitequeen, R.mipmap.whiteking, R.mipmap.whitebishop, R.mipmap.whiteknight, R.mipmap.whiterook, }; private Context context; public ImageAdapter(Context context){ this.context = context; } public int[] getImages(){ return images; } public int getCount(){ return images.length; } public Object getItem(int position){ return images[position]; } public long getItemId(int position){ return images[position]; } public View getView(int position, View convertView, ViewGroup parent){ ImageView iv = null; //if(convertView != null){ // iv = (ImageView) convertView; //}//else{ iv = new ImageView(context); iv.setLayoutParams(new GridView.LayoutParams(parent.getHeight()/8, parent.getWidth()/8)); iv.setScaleType(ImageView.ScaleType.CENTER_CROP); iv.setPadding(0, 0, 0, 0); //} iv.setImageResource(images[position]); return iv; }是自动增量列，请使用它。

Answer 3

很抱歉迟到了。我打算在几天前回复这篇文章。

＆＃34;最合适的＆＃34;我能想到的方法是使用LAG功能。

拿这个：

select purchase_id, cust_id, address, 
lag(address, 1) over (partition by cust_id order by purchase_id) prev_address 
from x order by cust_id, purchase_id;
-------------+---------+----------+--------------+
 PURCHASE_ID | CUST_ID | ADDRESS  | PREV_ADDRESS |
-------------+---------+----------+--------------+
 1           | 5       | address1 | [NULL]       |
 3           | 5       | address1 | address1     |
 5           | 5       | address3 | address1     |
 6           | 5       | address3 | address3     |
 7           | 5       | address1 | address3     |
 2           | 6       | address2 | [NULL]       |
 4           | 6       | address2 | address2     |
 8           | 6       | address4 | address2     |
-------------+---------+----------+--------------+

然后您可以轻松地检测具有您所描述的事件的行

select purchase_id, cust_id, address, prev_address from (
  select purchase_id, cust_id, address, 
  lag(address, 1) over (partition by cust_id order by purchase_id) prev_address 
  from x 
) sub 
where not equal_null(address, prev_address)
order by cust_id, purchase_id;
-------------+---------+----------+--------------+
 PURCHASE_ID | CUST_ID | ADDRESS  | PREV_ADDRESS |
-------------+---------+----------+--------------+
 1           | 5       | address1 | [NULL]       |
 5           | 5       | address3 | address1     |
 7           | 5       | address1 | address3     |
 2           | 6       | address2 | [NULL]       |
 8           | 6       | address4 | address2     |
-------------+---------+----------+--------------+

请注意，我使用EQUAL_NULL函数具有NULL = NULL语义。

请注意，LAG函数可能是计算密集型的（但与使用前面提出的ROW_NUMBER相比）

Answer 4

Snowflake 引入了 CONDITIONAL_CHANGE_EVENT，它完美地解决了所描述的情况：

<块引用>

当当前行中参数 expr1 的值与前一行中 expr1 的值不同时，返回窗口分区内每一行的窗口事件编号。 窗口事件编号开始从 0 开始递增 1 以指示到目前为止该窗口内的更改次数

数据准备：

CREATE OR REPLACE TABLE t(purchase_id INT, cust_id INT,
                          date DATE, address TEXT, description TEXT);

INSERT INTO t(purchase_id, cust_id, date, address, description)
VALUES 
 ( 1, 5, '2021-01-01'::DATE ,'address1','desc1')
,( 2, 6, '2021-01-01'::DATE ,'address2','desc2')
,( 3, 5, '2021-02-01'::DATE ,'address1','desc3')
,( 4, 6, '2021-02-01'::DATE ,'address2','desc4')
,( 5, 5, '2021-03-01'::DATE ,'address3','desc5')
,( 6, 5, '2021-03-01'::DATE ,'address3','desc6')
,( 7, 5, '2021-04-01'::DATE ,'address1','desc7')
,( 8, 6, '2021-05-01'::DATE ,'address4','desc8');

查询：

SELECT *, 
 CONDITIONAL_CHANGE_EVENT(address) OVER (PARTITION BY CUST_ID ORDER BY DATE) AS CCE
FROM t
ORDER BY purchase_id;

一旦确定了子组：CCE 列，就可以使用 QUALIFY 来查找每个 CUST_ID, CCE 的第一行。

完整查询：

WITH cte AS (
 SELECT *,
  CONDITIONAL_CHANGE_EVENT(address) OVER (PARTITION BY CUST_ID ORDER BY DATE) AS CCE
 FROM t
)
SELECT *
FROM  cte
QUALIFY ROW_NUMBER() OVER(PARTITION BY CUST_ID, CCE ORDER BY DATE) = 1
ORDER BY purchase_id;

输出：

SQL：order by，然后为多列选择具有不同值的第一行

4 个答案: