我已经下载了last.fm数据集,其中包含用户和他们听过的歌曲。我试图建立联想,听一个作者是否意味着听另一个作者。
数据集来自last.fm,有359349位用户和292542位艺术家。我首先使用dMcast()从用户和艺术家列构建布尔矩阵。然后转换为
这是我的代码:
library(data.table)
library(Matrix)
library(Matrix.utils)
library(magrittr)
library(arules)
col_names <- c(
"user_id",
"id2",
"artist",
"song"
)
last.fm <- fread("R/upworkR/Rex - association rules/lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv",
sep = '\t',
col.names = col_names,encoding = 'UTF-8')
last.fm$id1 <- as.factor(last.fm$id1)
(last.fm)
last.fm.matrix <- dMcast(last.fm, formula = user_id ~ artist)
last.fm.ngcM <- as(last.fm.matrix, "ngCMatrix") %>% as("transactions")
music.rules <- apriori(last.fm.ngcM, parameter = list(supp=0.0009, conf=0.3))
我尝试了各种支持和信心,但是它要么不生成规则,要么出现内存问题。我也改变了maxlen和maxtime,但还没有运气。输出如下:
Parameter specification: confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
0.3 0.1 1 none FALSE TRUE 5 9e-04 1 10 rules FALSE
Algorithmic control: filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 263
set item appearances ...[0 item(s)] done [0.00s]. set transactions ...[359349 item(s), 292542 transaction(s)] done [26.21s]. sorting and recoding items ... [0 item(s)] done [0.14s]. creating transaction tree ... done [0.02s]. checking subsets of size 1 done [0.00s]. writing ... [0 rule(s)] done [0.00s]. creating S4 object ... done [0.10s].