有一个样本数据集,其中包含有关糖尿病人的信息。我想找到序列规则(使用arulesSequences
库),它在rhs中只有一个元素 - “id_58”。我发布下面的代码,下载数据集(565 KB)并采取一些规则。你能告诉我怎么办才能达到我预定的目标吗?
library(arulesSequences)
download.file('http://staff.ii.pw.edu.pl/~gprotazi/dydaktyka/dane/diab_trans.data','diab_trans.data')
# some operations which result in transaction form of data
diab.df <- read.csv("diab_trans.data", header=TRUE, stringsAsFactors = FALSE)
write.table(diab.df, "diab_trans2.data", sep = "," , row.names = FALSE, col.names = FALSE )
diabSeq <- read_baskets(con = "diab_trans2.data", sep =",", info = c("sequenceID","eventID"))
# setting parameter, mining frequent sequential patterns and rules
seqParam = new ("SPparameter",support = 0.6, maxsize = 4, mingap=600, maxgap =150000, maxlen = 3)
patSeq= cspade(diabSeq,seqParam, control = list(verbose = TRUE, tidLists = FALSE, summary= TRUE))
seqRules = ruleInduction(patSeq,confidence = 0.8)
# inspect(seqRules)
# ideas which do not work
# finalSeq <- subset(seqRules, subset = (rhs %in% "id_58"))
# finalSeq <- subset(seqRules, rhs(seqRules) %in% c('id_58'))
答案 0 :(得分:1)
您的商品标签包含双引号:
fileIn.nextInt()
所以你需要清理它们或使用:
> itemLabels(rhs(seqRules))
[1] "\"id_33\"" "\"id_60\"" "\"id_62\"" "2" "\"id_64\"" "\"id_34\""
[7] "0" "3" "4" "5" "6" "7"
[13] "72" "8" "9" "\"id_57\"" "\"id_58\""