我在创建转换矩阵方面存在问题,位于我正在处理的数据集之下,
Name Rating ID DATE(YYYYmmdd)
@0CC 1 71476 20000704
@0CC 1 71476 20001204
@0RM 1 73565 20000919
@0RM 2 49960 20000131
@0RM 1 44457 20001214
@0RM 1 59451 20001023
@0TL 2 73862 20001212
@0TL 3 19824 20000929
@0TL 1 70970 20001211
@0TL 3 48061 20000627
@0TL 1 48061 20001227
@1AJ 1 58875 20001214
@1AJ 3 56014 20001214
@1AJ 3 47340 20001214
@1AJ 3 19813 20001214
@1AL 1 44416 20000517
@1AL 4 59184 20000801
@1AL 3 59184 20000413
@1AL 4 72832 20001127
@1AL 1 52718 20000621
@1AL 2 59184 20000707
@1AL 3 73568 20001130
@1AL 3 72832 20001211
@1AL 3 44416 20000303
我想要做的是每个唯一的名字,我想比较ID,如果ID匹配,我会查看日期,比较后一个日期和上一个日期,如果评级相似,我会忽略,但如果评级不同,我想计算一定数量。
在前两行中,看看Name @OCC,ID变量匹配并查看评级,它们是相似的,然后我不添加它来计算。但是,查看@ 1AL,ID变量匹配三次,查看发生的日期,有三个日期20000413,20000707和20000801,分别为3,2和4。随着评级再次从3变为2到4,我想将其记录在以下格式的转换矩阵中。
From 1 2 3 4 5 (to)
1
2 1
3 1
4
5
对于这个数据管理事物来说,这是我的新事物,
for(i in unique(dataset$Name)
if dataset[,3]=dataset[,3]
我不认为第二行甚至是正确的。我真的很困难,并希望得到任何我能得到的建议。
答案 0 :(得分:1)
花了一些时间,但我想我找到了解决问题的方法:
转换为data.table
install.packages("data.table") #if not installed already
require(data.table)
### DT: your data.frame
### e.g. copy and
#DT <- read.table("clipboard",header = T)
DT <- as.data.table(DT) # convert into data.table
setkey(DT, Name, DATE)
#this shows some temporary result:
DT[, print(Rating), by = list(Name, ID)]
# [1] 1 1
# [1] 1
# [1] 2
# [1] 1
# [1] 1
# [1] 2
# [1] 3
# [1] 1
# [1] 3 1
# [1] 1
# [1] 3
# [1] 3
# [1] 3
# [1] 1 3
# [1] 4 3 2
# [1] 4 3
# [1] 1
# [1] 3
一个问题是data.table没有为每个子集返回一个向量(据我所知)。因此,解决方案是将单个数字转换为更长的数字并稍后将其转换回来。
获得评分
setVal <- function(vec){
res <- 0
for (i in 1:length(vec)){
res <- res + vec[i] * 10^(length(vec)-i)
}
return(as.integer(res))
}
#save above shown result in vector.
DT <- DT[, R:=setVal(Rating), by = list(Name, ID)]
DT #the result is not as desired because e.g. 324 occurs 3 times (at each row which leads to 324), 11 occurs 2 times (at both rows contributing to 11).
# Name Rating ID DATE.YYYYmmdd. R
# 1: @0CC 1 71476 20000704 11
# 2: @0CC 1 71476 20001204 11
# 3: @0RM 2 49960 20000131 2
# 4: @0RM 1 73565 20000919 1
# 5: @0RM 1 59451 20001023 1
# 6: @0RM 1 44457 20001214 1
# 7: @0TL 3 48061 20000627 31
# 8: @0TL 3 19824 20000929 3
# 9: @0TL 1 70970 20001211 1
# 10: @0TL 2 73862 20001212 2
# 11: @0TL 1 48061 20001227 31
# 12: @1AJ 1 58875 20001214 1
# 13: @1AJ 3 56014 20001214 3
# 14: @1AJ 3 47340 20001214 3
# 15: @1AJ 3 19813 20001214 3
# 16: @1AL 3 44416 20000303 31
# 17: @1AL 3 59184 20000413 324
# 18: @1AL 1 44416 20000517 31
# 19: @1AL 1 52718 20000621 1
# 20: @1AL 2 59184 20000707 324
# 21: @1AL 4 59184 20000801 324
# 22: @1AL 4 72832 20001127 43
# 23: @1AL 3 73568 20001130 3
# 24: @1AL 3 72832 20001211 43
#The result has to be filtered by unique pairs of Name and ID.
R <- DT[,unique(R), by = list(Name, ID)]$V1
#[1] 11 2 1 1 1 31 3 1 2 1 3 3 3 31 324 1 43 3
将结果转换为转换矩阵
可能有一些更简单的方法可以将R
转换回单个数字,计算值并将它们放入矩阵中,但这就是我的想法:
TransitionMatrix <- function(col, ncol = 5){
intoMat <- function(Mat, vec){
if(length(vec)>1){
for (i in 1:(length(vec)-1)){
if (vec[i] != vec[i+1]){
Mat[vec[i], vec[i+1]] <- Mat[vec[i], vec[i+1]] + 1
}
}
}
return(Mat)
}
Mat <- matrix(0, ncol = ncol, nrow = ncol)
for (j in 1:length(col)){
L <- nchar(as.character(j))
if(L>1){
values <- as.numeric(unlist(strsplit(as.character(col[j]),"")))
Mat <- intoMat(Mat, values)
}
}
return(Mat)
}
TransitionMatrix(R, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 0 2 0 0
# [2,] 0 0 0 0 0
# [3,] 2 3 0 0 0
# [4,] 0 0 5 0 0
# [5,] 0 0 0 0 0
此解决方案的限制是当评级高于9且有2位数时。