我有一个看起来像这样的数据集。click here。该图像仅显示一小部分数据。我想编写一个函数,它将数据集和k(频繁项集将包含的项目的数量)作为输入,并给出k-频繁项集以及它们作为输出的支持。我试过这个。 val someRusultFuture = sortSeq.map{ rs =>
if(!rs.isEmpty){
//doSomething
}
}
其中a为from itertools import combinations,product
col_list = list(df.columns) # df is my dataframe
min_support = 8.43 # minimum support given
def freq_set(df, n):
if n == 1:
return(a) # a is the output of 1-frequent itemset
l = []
support = {}
col_comb = list(combinations(col_list,n))
for i in col_comb:
s = df.groupby(list(i)).size()
L = s.index[s > min_support].tolist()
k = 0
for m in L:
temp = list(combinations(m,n-1))
for j in temp:
if j in freq_set(df,n-1)[0]:
l.append(m)
support[m] = s.loc[L].values[k]
break
k += 1
return(l,support)
但它不起作用。