Question

情况：

我需要添加两个标识为这样的列标志：

该人是否在购买日期之前购买了相同产品。
该人是否在购买日期之前购买了任何其他产品。

输出应包含5列：

电子邮件
产品名称
已购买日期
SameProduct（0 =否，1 =是）
AnyProduct（0 =否，1 =是）

原始数据如下：

abc@gmail.com   cucumber    01-02-2019
abc@gmail.com   orange      04-02-2019
abc@gmail.com   grapefruit  15-02-2019
cde@gmail.com   blackberry  06-02-2019
cde@gmail.com   lime        15-02-2019
cde@gmail.com   lime        20-02-2019
zzz@gmail.com   apple       02-02-2019
zzz@gmail.com   apple       18-02-2019
zzz@gmail.com   orange      19-02-2019
zzz@gmail.com   apple       28-02-2019

目标：

我的输出看起来像这样：

Email           ProductName DatePurchased   SameProduct     AnyProduct
abc@gmail.com   cucumber    01-02-2019      0               0
abc@gmail.com   orange      04-02-2019      0               1
abc@gmail.com   grapefruit  15-02-2019      0               1
cde@gmail.com   blackberry  06-02-2019      0               0
cde@gmail.com   lime        15-02-2019      0               1
cde@gmail.com   lime        20-02-2019      1               1
zzz@gmail.com   apple       02-02-2019      0               0   
zzz@gmail.com   apple       18-02-2019      1               1   
zzz@gmail.com   orange      19-02-2019      0               1
zzz@gmail.com   apple       28-02-2019      1               1

我尝试了什么：我尝试过两次自己加入并使用用例语句，但是我觉得这种方式效率极低。

虚拟数据：

create table #table1 (email varchar(20), productname varchar(20), datepurchased date)
insert into #table1 values
('abc@gmail.com','cucumber','2019-02-01'),
('abc@gmail.com','orange','2019-02-04'),
('abc@gmail.com','grapefruit','2019-02-15'),
('cde@gmail.com','blackberry','2019-02-06'),
('cde@gmail.com','lime','2019-02-15'),
('cde@gmail.com','lime','2019-02-20'),
('zzz@gmail.com','apple','2019-02-02'),
('zzz@gmail.com','apple','2019-02-18'),
('zzz@gmail.com','orange','2019-02-19'),
('zzz@gmail.com','apple','2019-02-28')

注意：我的实际数据有1亿多行。我不确定哪种查询可以使数据处理尽快完成。

Answer 1

另一个获得结果的选项。

我使用ROW_NUMBER（）-1，所以我们可以给第一次出现的值一个零。然后，我使用SIGN（）将任何正值转换为1。

SELECT *,
    SameProduct = SIGN(ROW_NUMBER() OVER(PARTITION BY email, productname ORDER BY datepurchased)-1),
    AnyProduct  = SIGN(ROW_NUMBER() OVER(PARTITION BY email ORDER BY datepurchased)-1)
FROM #table1
ORDER BY email, datepurchased;

如果需要，可以将其转换为与使用SIGN（）相同的结果，但是在这种情况下，所有值均为正。

SELECT *,
    SameProduct = CAST(ROW_NUMBER() OVER(PARTITION BY email, productname ORDER BY datepurchased)-1 AS bit),
    AnyProduct  = CAST(ROW_NUMBER() OVER(PARTITION BY email ORDER BY datepurchased)-1 AS bit)
FROM #table1
ORDER BY email, datepurchased;

Answer 2

我的解决方案是使用#include <iostream> #include <string> #include <sstream> #include <cmath> using namespace std; const int FEE = 1250; // fee in cents //---- Utilities ----// string moneyString(int cents) { ostringstream oss; oss << cents/100 << '.' << cents % 100; return oss.str(); } int toCents(double money) { return int(round(money*100)); } int getMoney() { double money; cin >> money; return toCents(money); } //---- User input ----// // Available choices enum Choices { BALANCE = 1, WITHDRAW = 2, DEPOSIT = 3, LOGOUT = 4 }; short int getChoice() { short int choice = 0; while (choice < 1 or choice > 4) { cout << "1 - Current Balance" << '\n' << "2 - Withdraw" << '\n' << "3 - deposit" << '\n' << "4 - Log Out" << '\n' << "Option: "; string input; cin >> input; choice = atoi(input.c_str()); cout << endl; } return choice; } bool userWantsMoreActions() { cout << "Would you like to take any other actions today? "; char answer; cin >> answer; cout << endl; return toupper(answer) == 'Y'; } //---- Actions ----// void greeting(double &balance) { cout << "Hello, thank you for banking with Pallet Town Bank.\n"; cout << "Please enter your name. "; string name; cin >> name; cout << "Hello " << name << ". Your current balance is $" << moneyString(balance) << ".\n"; cout << "There will be a a service fee of $12.50 subtracted from your account.\n"; cout << "Your updated balance will be $" << moneyString(balance -= FEE) << " \n"; cout << "What would you like to do today?\n\n"; } void printBalance(const double &balance) { cout << "Current Balance is " << balance << '\n'; } void withdraw(double &balance) { cout << "Withdraw - How much would you like to withdraw? $"; int withdraw = getMoney(); cout << "Your new balance after withdrawing $" << withdraw << " will be $" << (balance -= withdraw -= FEE) << '\n'; } void deposit(double &balance) { cout << "Deposit - How much would you like to deposit? $"; int deposit = getMoney(); cout << "Your new balance after depositing $" << moneyString(deposit) << " will be $" << moneyString(balance += deposit -= FEE) << '\n'; } int main() { // Initialize a sample session: double balance = 157236; greeting(balance); while (true) { short int choice = getChoice(); if (choice == Choices::BALANCE) printBalance(balance); else if (choice == Choices::WITHDRAW) withdraw(balance); else if (choice == Choices::DEPOSIT) deposit(balance); else if (choice == Choices::LOGOUT) break; if (not userWantsMoreActions()) break; } cout << "Log Out - Thank you for banking with Pallet Town Bank. Have a great day!" << endl; }和LAG()。

ROW_NUMBER()始终引用先前的记录，因此检查先前和当前乘积是否相等非常有用。

LAG()仅用于标记第一次购买（行号= 1）

当然，ROW_NUMBER()和PARTITION BY子句对于按正确的顺序获取记录很重要。

我还检查了Vamsi Prabhalas的解决方案，但是ORDER BY的性能似乎比IIF快。

CASE-WHEN

Answer 3

使用count窗口函数或row_number的一种方法。

--count
select t.*
       ,case when count(*) over(partition by email,productname order by datepurchased) > 1 then 1 else 0 end as same_prev
       ,case when count(*) over(partition by email order by datepurchased) > 1 then 1 else 0 end as any_prev
from tbl t

--row_number
select t.*
           ,case when row_number() over(partition by email,productname order by datepurchased) > 1 then 1 else 0 end as same_prev
           ,case when row_number() over(partition by email order by datepurchased) > 1 then 1 else 0 end as any_prev
from tbl t

Answer 4

我会使用row_number()：

select t.*,
       (case when 1 = row_number() over (partition by email, productname order by datepurchased) 
             then 0 else 1
        end) as same_product,
       (case when 1 = row_number() over (partition by email order by datepurchased) 
             then 0 else 1
        end) as any_product
from #table1 t;

请注意，唯一的区别是row_number()。

您也可以在没有case比较的情况下执行此操作：

select t.*,
       coalesce(max(1) over (partition by email, productname order by datepurchased rows between unbounded preceding and 1 preceding), 0) as same_product,
       coalesce(max(1) over (partition by email order by datepurchased rows between unbounded preceding and 1 preceding), 0) as any_product
from table1 t
order by email, datepurchased;

Here是db <>小提琴。

标记某人是否曾购买过同一产品以及是否购买过任何产品

4 个答案: