MHSMM包R输入数据格式有多个变量

时间:2015-04-03 10:55:53

标签: r

我的问题类似于the problem of R-input Format

之后的问题

我在上面的链接中尝试了上面的代码并修改了一些部分以适应我的数据。我的数据就像关注4 variables, 397 * 4 我希望我的数据可以创建为具有4个变量向量的数据框。我修改过的代码是

formatMhsmm <- function(data){
 nb.sequences = nrow(data)
 nb.variables = ncol(data)
 data_df <- data.frame(matrix(unlist(data), ncol = 4, byrow = TRUE))
 # iterate over these in loops
 rows <- 1: nb.sequences     
 # build vector with id value
id = numeric(length = nb.sequences)
for( i in rows)
{
  id[i] = data_df[i,2]  
 }
 # build vector with time value
 time = numeric (length = nb.sequences)
 for( i in rows)
 {
  time[i] = data_df[i,3]  
  }
# build vector with observation values
 sequences = numeric(length = nb.sequences)
 for(i in rows)
{
  sequences[i] = data_df[i, 4]
  }
 data.df = data.frame(id,time,sequences)
# creation of hsmm data object need for training
 N <- as.numeric(table(data.df$id))
 train <- list(x = data.df$sequences, N = N)
 class(train) <- "hsmm.data"
 return(train)
}
library(mhsmm)
dataset <- read.csv("location.csv", header = TRUE)
train <- formatMhsmm(dataset)
print(train)

输出观察不是第4列的数据,它是(4,8,12,...,396,1,1,...,56,192,...,6550,68的列表) ,NA,NA,...)它已经获得了每个col的1/4数据。为什么会这样?

非常感谢!!!!

1 个答案:

答案 0 :(得分:1)

为什么不直接按Id计算观察结果,直接创建hsmm.data对象?假设你的数据帧被称为“数据”,我们有:

N <- as.numeric(table(data$id))
train <- list(x=data$location, N = N)
class(train) <- "hsmm.data"

摘自http://www.jstatsoft.org/v39/i04/paper