Haskell:为apriori生成k-itemsets

时间:2013-03-20 05:58:40

标签: haskell recursion apriori

我正在尝试生成在apriori中使用的所有k项集,我遵循这个伪代码:

L1= {frequent items};
for (k= 2; Lk-1 !=∅; k++) do begin
    Ck= candidates generated from Lk-1 (that is: cartesian product Lk-1 x Lk-1 and eliminating any 
    k-1 size itemset that is not frequent);
    for each transaction t in database do
        increment the count of all candidates in 
         Ck that are contained in t
    Lk = candidates in Ck with min_sup
    end
return U_k Lk;

,这是我的代码:

-- d transactions, threshold 
kItemSets d thresh = kItemSets' 2  $ frequentItems d thresh
    where
        kItemSets' _ [] = [[]]
        kItemSets' k t  = ck ++ (kItemSets' (k+1) ck)
            where
                -- those (k-1) length sets that meet the threshold of being a subset of the transactions in d
                ck = filter (\x->(countSubsets x d) >= thresh)  $ combinations k t

-- length n combinations that can be made from xs
combinations 0 _ = [[]]
combinations _ [] = []
combinations n xs@(y:ys)
  | n < 0     = []
  | otherwise = case drop (n-1) xs of
                  [ ] -> []
                  [_] -> [xs]
                  _   -> [y:c | c <- combinations (n-1) ys]
                            ++ combinations n ys   
-- those items of with frequency o in the dataset                
frequentItems xs o = [y| y <- nub cs, x<-[count y cs], x >= o]
    where
        cs = concat xs

isSubset a b  = not $ any (`notElem` b) a

-- Count how many times the list y appears as a subset of a list of lists xs
countSubsets y xs = length $ filter (isSubset y ) xs
count :: Eq a => a -> [a] -> Int
count x [] = 0
count x (y:ys) | x == y    = 1+(count x ys)
               | otherwise = count x ys

transactions =[["Butter", "Biscuits", "Cream", "Newspaper", "Bread", "Chocolate"],
          ["Cream", "Newspaper", "Tea", "Oil", "Chocolate"] ,
          ["Chocolate", "Cereal", "Bread"],
          ["Chocolate", "Flour", "Biscuits", "Newspaper"],
          ["Chocolate", "Biscuits", "Newspaper"] ]

但是当我编译时,我得到了错误:

apriori.hs:5:51:
    Occurs check: cannot construct the infinite type: a0 = [a0]
    Expected type: [a0]
      Actual type: [[a0]]
    In the second argument of kItemSets', namely `ck'
    In the second argument of `(++)', namely `(kItemSets' (k + 1) ck)'
Failed, modules loaded: none.

但是当我从ghci

开始
*Main> mapM_ print $ filter (\x->(countSubsets x transactions ) >= 2 ) $ combinations 2 $ frequentItems transactions 2
["Biscuits","Newspaper"]
["Biscuits","Chocolate"]
["Cream","Newspaper"]
["Cream","Chocolate"]
["Newspaper","Chocolate"]
["Bread","Chocolate"]

哪个是正确的,因为它是那些满足事务集中出现阈值的2项集。但我对3项集的需求是

[["Biscuits", "Chocolate", "Newspaper" ],
["Chocolate", "Cream", "Newspaper"]]

并将其附加到2项目集的列表中。我如何更改当前代码以实现此目的?我知道它可以从2件套装开始构建,但我不知道该怎么做。

1 个答案:

答案 0 :(得分:1)

不得不在第5行使用它:

kItemSets' k t  =  ck  ++ (kItemSets' (k+1) $ nub $ concat ck)

效率最高但效果不错。