我有一个应用于离散化处理的数据集,我想将数据集强制转换为使用arules包的事务。
CLUST_K <- structure(list(LONGITUDE = c(118.5, 118.5, 118.5, 118.5, 118.5,
118.5), LATITUDE = c(-11.5, -11.5, -11.5, -11.5, -11.5, -11.5
), DATE_START = structure(c(1419897600, 1419984000, 1420070400,
1420156800, 1420243200, 1420329600), class = c("POSIXct", "POSIXt"
)), DATE_END = structure(c(1420502400, 1420588800, 1420675200,
1420761600, 1420848000, 1420934400), class = c("POSIXct", "POSIXt"
)), FLAG = c(2, 1, 2, 2, 2, 2), SURFSKINTEMP = c(13L, 1L, 16L,
16L, 7L, 13L), SURFAIRTEMP = c(6L, 6L, 6L, 6L, 6L, 6L), TOTH2OVAP = c(5L,
17L, 17L, 17L, 17L, 17L), TOTO3 = c(16L, 16L, 16L, 10L, 7L, 7L
), TOTCO = c(12L, 12L, 8L, 4L, 12L, 12L), TOTCH4 = c(13L, 14L,
6L, 6L, 11L, 7L), OLR_ARIS = c(10L, 4L, 4L, 7L, 5L, 10L), CLROLR_ARIS = c(10L,
4L, 4L, 7L, 5L, 10L), OLR_NOAA = c(10L, 10L, 10L, 10L, 7L, 9L
), MODIS_LST = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("LONGITUDE",
"LATITUDE", "DATE_START", "DATE_END", "FLAG", "SURFSKINTEMP",
"SURFAIRTEMP", "TOTH2OVAP", "TOTO3", "TOTCO", "TOTCH4", "OLR_ARIS",
"CLROLR_ARIS", "OLR_NOAA", "MODIS_LST"), row.names = c(NA, 6L
), class = "data.frame")
从数据集CLUST_K,你可以看到
LONGITUDE LATITUDE DATE_START DATE_END FLAG SURFSKINTEMP SURFAIRTEMP TOTH2OVAP TOTO3 TOTCO TOTCH4 OLR_ARIS CLROLR_ARIS OLR_NOAA MODIS_LST
1 118.5 -11.5 2014-12-30 2015-01-06 2 13 6 5 16 12 13 10 10 10 1
2 118.5 -11.5 2014-12-31 2015-01-07 1 1 6 17 16 12 14 4 4 10 1
3 118.5 -11.5 2015-01-01 2015-01-08 2 16 6 17 16 8 6 4 4 10 1
4 118.5 -11.5 2015-01-02 2015-01-09 2 16 6 17 10 4 6 7 7 10 1
5 118.5 -11.5 2015-01-03 2015-01-10 2 7 6 17 7 12 11 5 5 7 1
6 118.5 -11.5 2015-01-04 2015-01-11 2 13 6 17 7 12 7 10 10 9 1
数据集的第一列到第五列是交易信息,第6列到第15列是交易,它们适用于离散化处理。
当我尝试将数据集强制转换为事务时
CLUST_K_R <- CLUST_K[,6:15]
CLUST_K_R_T <- as(CLUST_K_R,"transactions")
Error in asMethod(object) :
column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 not logical or a factor. Discretize the columns first.
但我的数据集已经应用于离散化进程
当我使用拆分时,它似乎也不正确
> s1 <- split(CLUST_K$SURFSKINTEMP, CLUST_K$SURFAIRTEMP,CLUST_K$TOTH2OVAP, CLUST_K$TOTO3)
> Tr <- as(s1,"transactions")
Warning message:
In asMethod(object) : removing duplicated items in transactions
> Tr
transactions in sparse format with
1 transactions (rows) and
4 items (columns)
只剩下1笔交易,但在我的情况下应该是6笔交易。
答案 0 :(得分:0)
由于您已经对数据进行了离散化(通过群集),您只需确保将数据编码为标称值(因子)而不是数字(整数)。
for(i in 1:ncol(CLUST_K_R)) CLUST_K_R[[i]] <- as.factor(CLUST_K_R[[i]])
CLUST_K_R_T <- as(CLUST_K_R,"transactions")
summary(CLUST_K_R_T)
transactions as itemMatrix in sparse format with
6 rows (elements/itemsets/transactions) and
30 columns (items) and a density of 0.3333333
most frequent items:
SURFAIRTEMP=6 MODIS_LST=1 TOTH2OVAP=17 TOTCO=12 OLR_NOAA=10 (Other)
6 6 5 4 4 35
element (itemset/transaction) length distribution:
sizes
10
6
Min. 1st Qu. Median Mean 3rd Qu. Max.
10 10 10 10 10 10
includes extended item information - examples:
labels variables levels
1 SURFSKINTEMP=1 SURFSKINTEMP 1
2 SURFSKINTEMP=7 SURFSKINTEMP 7
3 SURFSKINTEMP=13 SURFSKINTEMP 13
includes extended transaction information - examples:
transactionID
1 1
2 2
3 3