我正在玩一个样本数据集,其中的数据集看起来像这样
现在我使用以下内容成功转换了数据
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import mlxtend as ml
retail_data = pd.read_excel('Copy of FastFood_Transactions.xlsx')
sns.countplot(x = 'Product', data = retail_data, order = retail_data['Product'].value_counts().iloc[:10].index)
plt.xticks(rotation=90)
retail_data = retail_data.groupby(['Transaction ID','Product']).size().reset_index(name='count')
basket = (retail_data.groupby(['Transaction ID', 'Product'])['count']
.sum().unstack().reset_index().fillna(0)
.set_index('Transaction ID'))
#The encoding function
def encode_units(x):
if x <= 0:
return 0
if x >= 1:
return 1
basket_sets = basket.applymap(encode_units)
我验证了我的设置,看起来还不错。
但是,当我尝试如下应用关联规则和先验
frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.head()
规则为空白。
我尝试在应用apriori的地方检查支持和项集:
support itemsets
0 0.084238 (3 Crispy Strips (incl. dip))
1 0.080320 (9 Hot Wings)
2 0.061100 (9 Hot Wings)
3 0.058294 (10 Filet Bites)
4 0.080214 (Chicken Original 1 stuk)
我注意到支持率非常低。可能是因为数据集稀疏吗?