我的问题类似于the problem of R-input Format
之后的问题我在上面的链接中尝试了上面的代码并修改了一些部分以适应我的数据。我的数据就像关注 我希望我的数据可以创建为具有4个变量向量的数据框。我修改过的代码是
formatMhsmm <- function(data){
nb.sequences = nrow(data)
nb.variables = ncol(data)
data_df <- data.frame(matrix(unlist(data), ncol = 4, byrow = TRUE))
# iterate over these in loops
rows <- 1: nb.sequences
# build vector with id value
id = numeric(length = nb.sequences)
for( i in rows)
{
id[i] = data_df[i,2]
}
# build vector with time value
time = numeric (length = nb.sequences)
for( i in rows)
{
time[i] = data_df[i,3]
}
# build vector with observation values
sequences = numeric(length = nb.sequences)
for(i in rows)
{
sequences[i] = data_df[i, 4]
}
data.df = data.frame(id,time,sequences)
# creation of hsmm data object need for training
N <- as.numeric(table(data.df$id))
train <- list(x = data.df$sequences, N = N)
class(train) <- "hsmm.data"
return(train)
}
library(mhsmm)
dataset <- read.csv("location.csv", header = TRUE)
train <- formatMhsmm(dataset)
print(train)
输出观察不是第4列的数据,它是(4,8,12,...,396,1,1,...,56,192,...,6550,68的列表) ,NA,NA,...)它已经获得了每个col的1/4数据。为什么会这样?
非常感谢!!!!
答案 0 :(得分:1)
为什么不直接按Id计算观察结果,直接创建hsmm.data对象?假设你的数据帧被称为“数据”,我们有:
N <- as.numeric(table(data$id))
train <- list(x=data$location, N = N)
class(train) <- "hsmm.data"