正确地将数据帧转换为R中arules的事务

时间:2018-03-27 15:11:11

标签: r dataframe arules market-basket-analysis

我必须在R中执行关联规则,我找到了这个例子 这里 http://www.salemmarafi.com/code/market-basket-analysis-with-r/ 在此示例中,他们使用data(Groceries) 但他们给了原始数据集Groceries.csv

structure(list(chocolate = structure(c(9L, 13L, 1L, 8L, 16L, 
2L, 14L, 11L, 7L, 15L, 17L, 5L, 10L, 4L, 3L, 6L, 2L, 18L, 12L
), .Label = c("bottled water", "canned beer", "chicken,citrus fruit,tropical fruit,root vegetables,whole milk,frozen fish,rollsbuns", 
"chicken,pip fruit,other vegetables,whole milk,dessert,yogurt,whippedsour cream,rollsbuns,pasta,soda,waffles", 
"citrus fruit,pip fruit,root vegetables,other vegetables,whole milk,cream cheese ,domestic eggs,brown bread,margarine,baking powder,waffles", 
"frankfurter,citrus fruit,onions,other vegetables,whole milk,rollsbuns,sugar,soda", 
"frankfurter,rollsbuns,bottled water,fruitvegetable juice,hygiene articles", 
"frankfurter,sausage,butter,whippedsour cream,rollsbuns,margarine,spices", 
"fruitvegetable juice", "hamburger meat,other vegetables,whole milk,curd,yogurt,rollsbuns,pastry,semi-finished bread,margarine,bottled water,fruitvegetable juice", 
"meat,citrus fruit,berries,root vegetables,whole milk,soda", 
"packaged fruitvegetables,whole milk,curd,yogurt,domestic eggs,brown bread,mustard,pickled vegetables,bottled water,misc. beverages", 
"pickled vegetables,coffee", "root vegetables", "tropical fruit,margarine,rum", 
"tropical fruit,pip fruit,onions,other vegetables,whole milk,domestic eggs,sugar,soups,tea,soda,hygiene articles,napkins", 
"tropical fruit,root vegetables,herbs,whole milk,butter milk,whippedsour cream,flour,hygiene articles", 
"turkey,pip fruit,salad dressing,pastry"), class = "factor")), .Names = "chocolate", class = "data.frame", row.names = c(NA, 
-19L))

我加载此数据

g=read.csv("g.csv",sep=";")

所以我必须把它转换成arule需要的交易

#'@importClassesFrom arules transactions
trans = as(g, "transactions")

让我们'检查数据(Groceries)

> str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
  .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
  .. .. ..@ Dim     : int [1:2] 169 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 169 obs. of  3 variables:
  .. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
  .. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
  .. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
  ..@ itemsetInfo:'data.frame': 0 obs. of  0 variables
>

和我从原始csv转换的数据

> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
  .. .. ..@ p       : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ Dim     : int [1:2] 7011 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 7011 obs. of  3 variables:
  .. ..$ labels   : chr [1:7011] "tr=abrasive cleaner" "tr=abrasive cleaner,napkins" "tr=artif. sweetener" "tr=artif. sweetener,coffee" ...
  .. ..$ variables: Factor w/ 1 level "tr": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ levels   : Factor w/ 7011 levels "abrasive cleaner",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..@ itemsetInfo:'data.frame': 9835 obs. of  1 variable:
  .. ..$ transactionID: chr [1:9835] "1" "2" "3" "4" ...
> 

我们在数据(Groceries)中看到了

transactions in sparse format with
 9835 transactions (rows) and
 169 items (columns)

在我的传输数据中

 9835 transactions (rows) and
 7011 items (columns)

即。我从Groceries.csv获得了7011列,同时在嵌入式示例中(169列)

为什么会这样?这个文件如何转换正确。 我必须理解它,因为,我无法使用我的文件

我试过找到类似的话题 但这两个帖子对我没有帮助 How to prep transaction data into basket for arules R (arules) Convert dataframe into transactions and remove NA

1 个答案:

答案 0 :(得分:2)

这是因为数据在下载时以逗号分隔,而在var url = 'profile-image-url-here';// depends on user's profile of course var anonymousUrl = 'anonymous-profile-image-here'; var backgroundImage = "url(" + url + ") , url(" + anonymousUrl + ")"; $('#span-id').css('background-image', backgroundImage); 中,您在分号上分割数据。如果从g=read.csv("g.csv",sep=";")的定义中删除sep = ";",则应获得所需的输出。

请参阅以下内容,将sep定义为g

;

这就是将sep定义为> trans <- read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ';') > str(trans) Formal class 'transactions' [package "arules"] with 3 slots ..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots .. .. ..@ i : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ... .. .. ..@ p : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ... .. .. ..@ Dim : int [1:2] 7011 9835 .. .. ..@ Dimnames:List of 2 .. .. .. ..$ : NULL .. .. .. ..$ : NULL .. .. ..@ factors : list() ..@ itemInfo :'data.frame': 7011 obs. of 1 variable: .. ..$ labels: chr [1:7011] "abrasive cleaner" "abrasive cleaner,napkins" "artif. sweetener" "artif. sweetener,coffee" ... ..@ itemsetInfo:'data.frame': 0 obs. of 0 variables

,