在数据帧的N长度子集上运行函数并将结果存储到列表中

时间:2018-03-30 00:08:02

标签: r list dataframe

我正在使用depmixS4包来训练数据上的HMM。但是,我的数据集不是连续的(我有周六和周日的数据,背靠背但每周的观察数量不同)。例如,我的数据集可能如下所示:

1   Saturday  Evening             16.2  235.84
2   Saturday  Evening             23.4  235.29
3   Saturday  Evening             29.4  232.79
4   Sunday   Evening             24.2  233.89
5   Sunday   Evening             24.2  233.66
6   Sunday   Evening             24.2  233.38
7   Sunday   Evening             24.2  232.99
8   Sunday   Evening             25.4  233.21
9   Sunday   Evening             26.8  232.37
10  Saturday    Night            25.6  231.55
11  Saturday     Night           24.4  231.19
12  Saturday     Night           24.4  231.63
13  Saturday     Night           24.4  231.71
14  Sunday     Night             25.2  231.23
15  Sunday     Night             25.2  231.23
14  Saturday     Night             25.2  231.23
15  Saturday    Night             25.2  231.23
15  Sunday    Night             25.2  231.23

df = structure(list(V2 = c("Saturday", "Saturday", "Saturday", "Sunday", 
"Sunday", "Sunday", "Sunday", "Sunday", "Sunday", "Saturday", 
"Saturday", "Saturday", "Saturday", "Sunday", "Sunday", "Saturday", 
"Saturday", "Sunday"), V3 = c("Evening", "Evening", "Evening", 
"Evening", "Evening", "Evening", "Evening", "Evening", "Evening", 
"Night", "Night", "Night", "Night", "Night", "Night", "Night", 
"Night", "Night"), V4 = c(16.2, 23.4, 29.4, 24.2, 24.2, 24.2, 
24.2, 25.4, 26.8, 25.6, 24.4, 24.4, 24.4, 25.2, 25.2, 25.2, 25.2, 
25.2), V5 = c(235.84, 235.29, 232.79, 233.89, 233.66, 233.38, 
232.99, 233.21, 232.37, 231.55, 231.19, 231.63, 231.71, 231.23, 
231.23, 231.23, 231.23, 231.23)), .Names = c("V2", "V3", "V4", 
"V5"), row.names = c(NA, -18L), class = "data.frame")

在该示例中,集合1具有9个观测值,集合2具有6个观测值,集合3具有3个观测值。我已经有一个列表,其中包含这些集合的观察数量,顺序为:[9,6,3]。我想使用列表对这部分数据进行子集化,将其传递给depmix函数,拟合模型,并使用for循环将拟合模型的对数似然结果存储到列表中。

例如:

set.seed(1)
mod[i] <- depmix(list(V4~1, V5~1), data = dataset[i], nstates=10, family=list(gaussian(),  gaussian()))
fm[i] <- fit(mod[i])
append(resultList, fm[i])

#Where [i] is the iteration of the loop, and dataset[i] corresponds to the i'th subset of length N corresponding to the i'th element in the list (in the example, the list is [9,6,3])

我意识到这有两个问题,一个用于使用列表对数据帧进行子集化,另一个用于运行函数并将结果插入列表中。

2 个答案:

答案 0 :(得分:0)

正如r2evans所说我将df拆分为:

split_data = split(df, rep(1:3, times=c(9,6,3)))

然后我在我创建的函数上使用了sapply来多次运行该函数:

runIndividualWeeks <- function(data_input){
    mod = depmix(list(V3~1, V4~1), data = data_input, nstates=10, family=list(gaussian(), gaussian()))
    fit(temp, verbose = FALSE)
}

results <- sapply(split_data, runIndividualWeeks)

答案 1 :(得分:0)

考虑newField(面向对象的by包装器),这是一种经常使用不足的split-apply函数,它在一次调用中运行以返回函数返回的列表。

tapply