准备arules交易清单

时间:2013-03-25 07:50:09

标签: r arules

arules需要一个交易清单。列表中的每一行都包含一系列产品。并非每笔交易都有相同数量的产品。它听起来像枢轴,但事实并非如此。 可以找到一个示例here

我想要类似的东西 aggregate(dvd , by=list("ID"), FUN=c) arguments must have same length

失败

这是我的数据

> dvd
   ID          Item
1   1   Sixth Sense
2   1         LOTR1
3   1 Harry Potter1
4   1    Green Mile
5   1         LOTR2
6   2     Gladiator
7   2       Patriot
8   2    Braveheart
9   3         LOTR1
10  3         LOTR2
11  4     Gladiator
12  4       Patriot
13  4   Sixth Sense
14  5     Gladiator
15  5       Patriot
16  5   Sixth Sense
17  6     Gladiator
18  6       Patriot
19  6   Sixth Sense
20  7 Harry Potter1
21  7 Harry Potter2
22  8     Gladiator
23  8       Patriot
24  9     Gladiator
25  9       Patriot
26  9   Sixth Sense
27 10   Sixth Sense
28 10          LOTR
29 10     Galdiator
30 10    Green Mile

我需要一个看起来像那样的列表

TR1     c("Sixth Sense","LOTR1","Harry Potter1","Green Mile","LOTR2")
TR2     c("Gladiator","Patriot","Braveheart")
TR3     c("LOTR1","LOTR2")
....

3 个答案:

答案 0 :(得分:2)

您的aggregate命令可以正常工作,但您没有正确指定参数。你需要这样的东西:with(DF, aggregate(Item, list(ID), FUN = function(x) c(as.character(x))))

或者,您可以使用aggregate的公式方法:

aggregate(Item ~ ID, DF, c)
#    ID                                                 Item
# 1   1 Sixth Sense, LOTR1, Harry Potter1, Green Mile, LOTR2
# 2  10             Sixth Sense, LOTR, Galdiator, Green Mile
# 3   2                       Gladiator, Patriot, Braveheart
# 4   3                                         LOTR1, LOTR2
# 5   4                      Gladiator, Patriot, Sixth Sense
# 6   5                      Gladiator, Patriot, Sixth Sense
# 7   6                      Gladiator, Patriot, Sixth Sense
# 8   7                         Harry Potter1, Harry Potter2
# 9   8                                   Gladiator, Patriot
# 10  9                      Gladiator, Patriot, Sixth Sense
str(.Last.value)
# 'data.frame':  10 obs. of  2 variables:
# $ ID  : chr  "1" "10" "2" "3" ...
# $ Item:List of 10
#  ..$ 1 : chr  "Sixth Sense" "LOTR1" "Harry Potter1" "Green Mile" ...
#  ..$ 6 : chr  "Sixth Sense" "LOTR" "Galdiator" "Green Mile"
#  ..$ 10: chr  "Gladiator" "Patriot" "Braveheart"
#  ..$ 13: chr  "LOTR1" "LOTR2"
#  ..$ 15: chr  "Gladiator" "Patriot" "Sixth Sense"
#  ..$ 18: chr  "Gladiator" "Patriot" "Sixth Sense"
#  ..$ 21: chr  "Gladiator" "Patriot" "Sixth Sense"
#  ..$ 24: chr  "Harry Potter1" "Harry Potter2"
#  ..$ 26: chr  "Gladiator" "Patriot"
#  ..$ 28: chr  "Gladiator" "Patriot" "Sixth Sense"

或者,您可以使用“data.table”包:

library(data.table)
as.data.table(DF)[, list(list(Item)), by = ID]
#     ID                                               V1
#  1:  1 Sixth Sense,LOTR1,Harry Potter1,Green Mile,LOTR2
#  2:  2                     Gladiator,Patriot,Braveheart
#  3:  3                                      LOTR1,LOTR2
#  4:  4                    Gladiator,Patriot,Sixth Sense
#  5:  5                    Gladiator,Patriot,Sixth Sense
#  6:  6                    Gladiator,Patriot,Sixth Sense
#  7:  7                      Harry Potter1,Harry Potter2
#  8:  8                                Gladiator,Patriot
#  9:  9                    Gladiator,Patriot,Sixth Sense
# 10: 10            Sixth Sense,LOTR,Galdiator,Green Mile

答案 1 :(得分:2)

arules'read.transactions有一个参数format可以解决您的问题。这是用法:

read.transactions(file, format = c("basket", "single"), sep = NULL,
                  cols = NULL, rm.duplicates = FALSE, encoding = "unknown")

请参阅format参数?您可以使用“basket”或“single”来表示输入数据的格式。您正在尝试将数据转换为“篮子”格式,但您拥有的数据类型已经是“单一” - 每行包含一个带ID的单个项目。只需使用read.transactions并将format设置为“single”,你就是黄金。

答案 2 :(得分:1)

我认为split将为您完成这项工作。

    DF <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 
10L, 10L, 10L, 10L), Item = c("   Sixth Sense", "         LOTR1", 
" Harry Potter1", "    Green Mile", "         LOTR2", "     Gladiator", 
"       Patriot", "    Braveheart", "         LOTR1", "         LOTR2", 
"     Gladiator", "       Patriot", "   Sixth Sense", "     Gladiator", 
"       Patriot", "   Sixth Sense", "     Gladiator", "       Patriot", 
"   Sixth Sense", " Harry Potter1", " Harry Potter2", "     Gladiator", 
"       Patriot", "     Gladiator", "       Patriot", "   Sixth Sense", 
"   Sixth Sense", "          LOTR", "     Galdiator", "    Green Mile"
)), .Names = c("ID", "Item"), class = "data.frame", row.names = c(NA, 
-30L))

    DF <- read.csv(textConnection(txt), header = TRUE, stringsAsFactors = FALSE, strip.white = TRUE)
result <- split(DF$Item, DF$ID)
names(result) <- gsub("(.*)", "TR\\1", names(result))
result
## $TR1
## [1] "Sixth Sense"   "LOTR1"         "Harry Potter1" "Green Mile"    "LOTR2"        
## 
## $TR2
## [1] "Gladiator"  "Patriot"    "Braveheart"
## 
## $TR3
## [1] "LOTR1" "LOTR2"
## 
## $TR4
## [1] "Gladiator"   "Patriot"     "Sixth Sense"
## 
## $TR5
## [1] "Gladiator"   "Patriot"     "Sixth Sense"
## 
## $TR6
## [1] "Gladiator"   "Patriot"     "Sixth Sense"
## 
## $TR7
## [1] "Harry Potter1" "Harry Potter2"
## 
## $TR8
## [1] "Gladiator" "Patriot"  
## 
## $TR9
## [1] "Gladiator"   "Patriot"     "Sixth Sense"
## 
## $TR10
## [1] "Sixth Sense" "LOTR"        "Galdiator"   "Green Mile"