不能强制列表与重复项目的交易:: Win7 SP1 64 :: R v3.02

时间:2013-11-30 20:08:57

标签: r transactions duplicates transformation apriori

问题

我无法理解如何将列表转换为事务以供apriori算法进一步处理。我有一个合成的例子,它是有效的,而且是真实的(好吧,Foodmart数据库的一个子集),它不起作用;它们在系统级别上看起来和我一样。请帮我将列表转换为事务对象。

系统设置

> version
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          0.2                         
year           2013                        
month          09                          
day            25                          
svn rev        63987                       
language       R                           
version.string R version 3.0.2 (2013-09-25)
nickname       Frisbee Sailing        

要复制的代码

有效的代码

> a_list <- list(
    c("a","b","c"),
    c("a","b"),
    c("a","b","d"),
    c("c","e"),
    c("c","e"),
    c("a","b","d","e")
)

> a_trans <- as(a_list,"transactions")

> summary(a_trans)
transactions as itemMatrix in sparse format with
6 rows (elements/itemsets/transactions) and
5 columns (items) and a density of 0.5333333 
... and so on ...
2      b
3      c

> a_rules <- apriori(a_trans)

parameter specification:
confidence minval smax arem  aval originalSupport support minlen maxlen target   ext
... and so on ...
writing ... [17 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

不起作用的代码

> b_list <- list(
    c("PigTail Frozen Pepperoni Pizza","Bird Call Childrens Cold Remedy","Steady Silky Smooth Hair Conditioner","CDR Regular Coffee"),
    c("Horatio Graham Crackers","Excellent Apple Drink","Blue Medal Small Eggs","Cormorant Copper Cleaner","High Quality Copper Cleaner","Fast Apple Fruit Roll"),
    c("Toucan Canned Mixed Fruit","Landslide Salt","Gorilla Sour Cream","Hermanos Firm Tofu"),
    c("Swell Canned Mixed Fruit","Washington Diet Soda","Super Apple Jam","Plato Strawberry Preserves","Steady Whitening Toothpast","Steady Whitening Toothpast","Better Beef Soup","Hermanos Squash","Carrington Frozen Cheese Pizza","Fort West Fondue Mix","Best Choice Mini Donuts","Cormorant Copper Pot Scrubber","Ebony Cantelope","Denny D-Size Batteries","Akron Eyeglass Screwdriver"),
    c("Big Time Ice Cream Sandwich","Musial Mints","Portsmouth Imported Beer","CDR Vegetable Oil","Just Right Rice Soup","Carrington Frozen Peas","High Quality 100 Watt Lightbulb","Fort West Dried Dates"),
    c("Consolidated Tartar Control Toothpaste","Plato Tomato Sauce","Quick Seasoned Hamburger")
)

> b_trans <- as(b_list,"transactions")
Error in asMethod(object) : 
    can not coerce list with transactions with duplicated items

> summary(b_trans)
Error in summary(b_trans) : 
   error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 'b_trans' not found

有趣的事情

> duplicated(a_list)
[1] FALSE FALSE FALSE FALSE  TRUE FALSE

> duplicated(b_list)
[1] FALSE FALSE FALSE FALSE FALSE FALSE

为什么这个神话般的(WTF)事情会发生?

1 个答案:

答案 0 :(得分:3)

joran和DWin提到:

  • a_list中的字符向量元素是唯一的。
  • b_list的一个向量中存在重复。

它看起来如何。如果我将第二个“b”添加到a_list2的第一个向量

> a_list2 <- list(
    c("a","b","b","c"),
    c("a","b"),
    c("a","b","d"),
    c("c","e"),
    c("c","e"),
    c("a","b","d","e")
)

在以下尝试转换数据时我得到了错误

> a_trans2 <- as(a_list2,"transaction")
Error in as(a_list2, "transaction") : 
   no method or default for coercing “list” to “transaction”

似乎b_list在第四个载体中有两次提到的“Steady Whitening Toothpast”。手动删除此复制解决了这个问题。

> b_trans2 <- as(b_list2,"transactions")
> summary(b_trans2)
transactions as itemMatrix in sparse format with
6 rows (elements/itemsets/transactions) and
... and so on ...
2    Best Choice Mini Donuts
3           Better Beef Soup

在谈到实际数据处理的解决方案时,以下代码不会产生任何错误。

aggrData <- split(selData$product_name,selData$transaction_id)

listData <- list()
for (i in 1:length(aggrData)) {
    listData[[i]] <- as.character(aggrData[[i]][!duplicated(aggrData[[i]])])
}

trnsData <- as(listData,"transactions")

但是,以下行或其他参数的尝试都没有规则。

> rules <- apriori(trnsData)

parameter specification:
... and so on ...
writing ... [0 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].

然而,这是一个完全不同的故事。