替代自我加入

时间:2019-08-21 08:27:00

标签: sql google-bigquery

我有一个表格供应网络,其中包括四列:

  

客户ID,供应商ID,供应商产品ID,购买年份

我想构建一个客户对,其中两个客户都需要在同一年度从同一供应商那里购买相同的产品。 我在keytool -list -printcert -jarfile yourapkname.apk中使用self-join来做到这一点。但这太慢了。还有其他选择吗?

BigQuery

3 个答案:

答案 0 :(得分:0)

使用连接语法并为“客户ID”列编制索引

select distinct
  a.CustomerID as focal_CustomerID,
  b.CustomerID as linked_CustomerID,
  a.Purchase_Year,
  a.Supplier_productID
from 
  supplynetwork as a join
  supplynetwork as b
  on   
  a.Purchase_Year=b.Purchase_Year and
  a.Supplier_productID=b.Supplier_productID and
  a.SupplierID=b.SupplierID
  where a.CustomerID<>b.CustomerID 

答案 1 :(得分:0)

您可以使用聚合在单个行中获得满足条件的 all 个客户:

select Purchase_Year, Supplier_productID, SupplierID,
       array_agg(distinct CustomerID) as customers
from supplynetwork sn
group by Purchase_Year, Supplier_productID, SupplierID;

然后您可以使用数组操作获取对:

with pss as (
      select Purchase_Year, Supplier_productID, SupplierID,
             array_agg(distinct CustomerID) as customers
      from supplynetwork sn
      group by Purchase_Year, Supplier_productID, SupplierID
     )
select c1, c2, pss.*
from pss cross join
     unnest(pss.customers) c1 cross join
     unnest(pss.customers) c2
where c1 < c2;

答案 2 :(得分:0)

您可以使用CROSS JOIN,即使使用笛卡尔,select focal_CustomerID, linked_CustomerID, Purchase_Year, Supplier_ProductID from ( select SupplierID, Supplier_ProductID, Purchase_Year, array_agg(distinct CustomerID) as Customers from `mydataset.mytable` group by 1,2,3 ), unnest(Customers) focal_CustomerID cross join unnest(Customers) linked_CustomerID where focal_CustomerID != linked_CustomerID 也可能会给您带来简化的好处。在下面尝试以下查询,看看它是否比您的基准便宜:

@PreAuthorize("hasRole('ROLE_USER')")