Question

对于当前项目，我试图找到一种方法将大量的表数据（300个以上的192个变量）转换为arules的事务数据。逻辑上格式化了大量变量。

我已经从library(arules)：newdata <- read.transactions("olddata.csv", format = "basket", rm.duplicates = FALSE, skip = 1)

尝试了以下操作

但是我收到以下错误： Error in asMethod(object) : can not coerce list with transactions with duplicated items

我不想删除重复项，因为我丢失了大量数据，因为它会在第一次出现后删除每个重复的逻辑T / F.

我想我可以尝试使用for循环来完成我的任务：

newdata <- ""
for (row in 1:nrow(olddata)) {
  if (row !=1) {
    newdata <- paste0(newdata, "\n")}
  newdata <- paste0(newdata, row,",")
  for (col in 2:ncol(olddata)) {
    if (col !=2) {
      newdata <- paste0(newdata, ",")}
    newdata <- paste0(newdata, colnames(olddata),"=", olddata[row,col])}
}

write(newdata,"newdata.csv")`

我的目标是让每个观察的每个变量的值看起来如下：columnnameA=TRUE，columnnameB=FALSE等。这将消除＆＃34;重复＆＃34;对于read.transactions函数并保留所有数据。

但是我的输出开始如下：

 [1] "1,Recipient=Thu Feb 04 21:52:00 UTC      2016,Recipient=TRUE,Recipient=TRUE,Recipient=FALSE,Recipient=TRUE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE\n2,Recipient=Thu Feb 04 21:52:00 UTC 2016,Recipient=TRUE,Recipient=TRUE,Recipient=FALSE,Recipient=TRUE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE,Recipient=FALSE\n3

请注意，收件人是我olddata对象中的第一个变量名。在将每个观察结果作为Recipient=X后，它将更改为下一个变量名称并重复。我最终得到了一个有超过500万次观察的文件......哎呀！这是我对嵌套for循环的第一次实际尝试。不确定这是最好的方法，还是有更好的方法。

提前感谢您提出的任何想法或见解。

将表数据转换为r

0 个答案: