使用R将事务类型数据表替换为不同表中的项目编号

时间:2016-04-22 21:02:30

标签: r string market-basket-analysis

首先我们有一个交易数据,我们可以使用内置的数据集。

require(arules)

## Can use built-in example dataset
require(datasets)
data(Groceries)

groceries <- as ( "transactions") # convert to 'transactions' class

summary(groceries)

输出是:

most frequent items:
  whole milk other vegetables       rolls/buns             soda           yogurt          (Other) 
        2513             1903             1809             1715             1372            34055 

但是我们还有另一个数据表,我们希望将数据用于标记:

itemnum <- c(1,2,3,4,5)
ProductName_ <- factor(c("whole milk", "other vegetables", "rolls/buns", "soda", "yogurt"))
ProductNames <- data.frame(itemnum, ProductName_)

如何使用第二个中的itemnum替换第一个表上的产品说明?

所以当我跑:

summary(groceries)

输出结果为:

most frequent items:
     1      2       3      4       5       (Other) 
  2513   1903    1809   1715    1372         34055 

1 个答案:

答案 0 :(得分:0)

您可以在调用summary

之前更改数据
library(arules)
library(datasets)
data(Groceries)
summary(Groceries)
# transactions as itemMatrix in sparse format with
# 9835 rows (elements/itemsets/transactions) and
# 169 columns (items) and a density of 0.02609146 
# 
# most frequent items:
#     whole milk other vegetables       rolls/buns             soda           yogurt 
# 2513             1903             1809             1715             1372 
# (Other) 
# 34055 
# 
# element (itemset/transaction) length distribution:
#     sizes
# 1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19 
# 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46   29   14   14 
# 20   21   22   23   24   26   27   28   29   32 
# 9   11    4    6    1    1    1    1    3    1 
# 
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 1.000   2.000   3.000   4.409   6.000  32.000 
# 
# includes extended item information - examples:
#     labels  level2           level1
# 1 frankfurter sausage meet and sausage
# 2     sausage sausage meet and sausage
# 3  liver loaf sausage meet and sausage


itemnum <- c(1,2,3,4,5)
ProductName_ <- factor(c("whole milk", "other vegetables", "rolls/buns", "soda", "yogurt"))
ProductNames <- data.frame(itemnum, ProductName_)

#change values in Groceries@itemInfo$labels check out plyr::mapvalues as well
Groceries@itemInfo$labels <- ProductNames$itemnum[match(Groceries@itemInfo$labels,ProductNames$ProductName_)]
summary(Groceries)
# transactions as itemMatrix in sparse format with
# 9835 rows (elements/itemsets/transactions) and
# 169 columns (items) and a density of 0.02609146 
# 
# most frequent items:
#     1       2       3       4       5 (Other) 
# 2513    1903    1809    1715    1372   34055 
# 
# element (itemset/transaction) length distribution:
#     sizes
# 1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19 
# 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46   29   14   14 
# 20   21   22   23   24   26   27   28   29   32 
# 9   11    4    6    1    1    1    1    3    1 
# 
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 1.000   2.000   3.000   4.409   6.000  32.000 
# 
# includes extended item information - examples:
#     labels  level2           level1
# 1     NA sausage meet and sausage
# 2     NA sausage meet and sausage
# 3     NA sausage meet and sausage