在Redshift上创建Apriori项集

时间:2018-10-01 21:02:14

标签: sql amazon-redshift apriori

我的目标是使用Apriori算法从AWS Redshift上创建的购买表中找到有趣的见解。购买表如下表所示。

public class NameActivity extends AppCompatActivity {

  private NameViewModel mModel;

   @Override
  protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);

    // Other code to setup the activity...

    // Get the ViewModel.
    mModel = ViewModelProviders.of(this).get(NameViewModel.class);


    // Create the observer which updates the UI.
    final Observer<String> nameObserver = new Observer<String>() {
        @Override
        public void onChanged(@Nullable final String newName) {
            // Update the UI, in this case, a TextView.
            mNameTextView.setText(newName);
        }
    };

    // Observe the LiveData, passing in this activity as the LifecycleOwner and the observer.
    mModel.getCurrentName().observe(this, nameObserver);
  }
}

我能够计算乘积的频率并以较低的频率过滤那些观察。但是,我很难在AWS Redshift环境中创建项目集的规则。这就是我想要得到的:

-------------
ID | product
1    A
1    B
1    C
2    A
2    C

购买表中有1000多种产品,所以我想学习如何编写有效且高效的查询来解决此问题。谢谢。

1 个答案:

答案 0 :(得分:3)

使用自联接:

select t1.product, t2.product, count(*)
from t t1 join
     t t2
     on t1.id = t2.id and t1.product < t2.product
group by t1.product, t2.product;

这会将项目集放在两列中。您也可以将它们连接在一起:

select t1.product || ',' || t2.product, count(*)
from t t1 join
     t t2
     on t1.id = t2.id and t1.product < t2.product
group by t1.product, t2.product
order by t1.product || ',' || t2.product;

Here是一个SQL Fiddle,说明该代码有效。