我的数据集是一个包含1000个元素类型data.frame(“sportdata”)的列表。列表中的每个data.frame元素代表一分钟的数据,并且具有完全相同的数字和数字。列的名称和每个data.frame具有最多45个ID(即45行,但在几分钟内缺少一个或多个ID,因此它可以是例如35行)。 我希望将每15个data.frames的完整数据集合并平均,将其添加到一个data.frame中并转置data.frame,以便我将ID作为列,将每个15min的平均SpeedKph作为行。
我的data.frames列表如下所示:
head(sportdata)
[[1]]
ID Distance SpeedKph
1: 1 2247 73
2: 2 2247 73
3: 3 1970 73
4: 4 1964 74
5: 5 1971 73
[[2]]
ID Distance SpeedKph
1: 1 2247 73
2: 2 2247 75
3: 3 1970 73
4: 4 1964 74
5: 5 1971 73
[[3]]
ID Distance SpeedKph
1: 1 2247 73
2: 2 2247 80
3: 3 1970 73
4: 4 1964 74
5: 5 1971 56
我有下面的代码来组合和平均我列表中的所有data.frames,但我还没有找到一种方法来组合和平均每15个元素(即15分钟)的列表,并将其添加到一个数据中。帧。
dfTotal <- rbindlist(sportdata)[,lapply(.SD,mean), list(ID)]
我希望我理想的输出data.frame看起来像:
#ofData.Frames | 1 | 2 | 3 |...etc.
01-15: 73 74 74
16-30: 75 77 74
31-45: 74 74 79
46-60: 78 72 74
...etc.
提前感谢您的帮助!
更新 很抱歉没有直接这样做,特此是我可重复的例子。
my.df1 <- data.frame(ID = c(1:5),
Distance = c(2247,2247,1970,1964,1971),
SpeedKph = c(73,73,74,73,75))
my.df2 <- data.frame(ID = c(1:5),
Distance = c(2247,2247,1970,1964,1971),
SpeedKph = c(73,73,74,73,75))
my.df3 <- data.frame(ID = c(1:5),
Distance = c(2247,2247,1970,1964,1971),
SpeedKph = c(75,70,80,71,83))
my.list <- list(list1 = my.df1, list2 = my.df2, list3 = my.df3)
答案 0 :(得分:3)
data.table
(您已经在使用)的可能解决方案:
DT <- rbindlist(my.list, idcol = 'id')
DT[, grp := (id - 1) %/% 3
][, c(frames = toString(id), lapply(.SD, mean)), by = .(grp, ID), .SDcols = 3:4
][, dcast(.SD, frames ~ ID, value.var = c('Distance','SpeedKph'))]
给出:
frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 1, 2, 3 2247 2247.000 1970.000 1964.000 1971 73.66667 72.00000 76.00000 72.33333 77.66667 2: 4, 5, 6 2229 2410.333 1962.667 1964.333 1966 74.66667 73.66667 77.33333 72.33333 77.66667
扩展示例数据:
my.df1 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(73,73,74,73,75))
my.df2 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(73,73,74,73,75))
my.df3 <- data.frame(ID = c(1:5), Distance = c(2247,2247,1970,1964,1971), SpeedKph = c(75,70,80,71,83))
my.df4 <- data.frame(ID = c(1:5), Distance = c(2247,2137,1948,1965,1971), SpeedKph = c(73,78,74,73,71))
my.df5 <- data.frame(ID = c(1:5), Distance = c(2223,2247,1970,1964,1971), SpeedKph = c(76,73,74,73,79))
my.df6 <- data.frame(ID = c(1:5), Distance = c(2217,2847,1970,1964,1956), SpeedKph = c(75,70,84,71,83))
my.list <- list(my.df1, my.df2, my.df3, my.df4, my.df5, my.df6)
回应评论:
# create some extra example data
my.df4a <- my.df4[-4,]
my.df5a <- my.df5[-c(4,5),]
my.df6a <- my.df6[-c(3,4),]
my.df7 <- my.df4[-c(4:6),]
my.df8 <- my.df5[-c(4:6),]
my.df9 <- my.df6[-c(4:6),]
# make another list of 9 dataframes
my.list2 <- list(my.df1, my.df2, my.df3, my.df4a, my.df5a, my.df6a, my.df7, my.df8, my.df9)
# bind that list together in one data.table
DT2 <- rbindlist(my.list2, idcol = 'dfid')
# do an 'expand join' with 'CJ' and add the original transformation
DT2[CJ(dfid = dfid, ID = ID, unique = TRUE), on = .(dfid, ID)
][, grp := (dfid - 1) %/% 3
][, c(frames = toString(dfid), lapply(.SD, mean, na.rm = TRUE)), by = .(grp, ID), .SDcols = 3:4
][, dcast(.SD, frames ~ ID, value.var = c('Distance','SpeedKph'))]
这给出了:
frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 1, 2, 3 2247 2247.000 1970.000 1964 1971.0 73.66667 72.00000 76.00000 72.33333 77.66667 2: 4, 5, 6 2229 2410.333 1959.000 NaN 1963.5 74.66667 73.66667 74.00000 NaN 77.00000 3: 7, 8, 9 2229 2410.333 1962.667 NaN NaN 74.66667 73.66667 77.33333 NaN NaN
关于行顺序:
my.df10 <- my.df4
my.df11 <- my.df5
my.df12 <- my.df6
my.list3 <- list(my.df1, my.df2, my.df3, my.df4a, my.df5a, my.df6a, my.df7, my.df8, my.df9, my.df10, my.df11, my.df12)
DT3 <- rbindlist(my.list3, idcol = 'dfid')
DT3[CJ(dfid = dfid, ID = ID, unique = TRUE), on = .(dfid, ID)
][, grp := (dfid - 1) %/% 3
][, c(frames = toString(dfid), lapply(.SD, mean, na.rm = TRUE)), by = .(grp, ID), .SDcols = 3:4
][, dcast(.SD, grp + frames ~ ID, value.var = c('Distance','SpeedKph'))]
这给出了:
grp frames Distance_1 Distance_2 Distance_3 Distance_4 Distance_5 SpeedKph_1 SpeedKph_2 SpeedKph_3 SpeedKph_4 SpeedKph_5 1: 0 1, 2, 3 2247 2247.000 1970.000 1964.000 1971.0 73.66667 72.00000 76.00000 72.33333 77.66667 2: 1 4, 5, 6 2229 2410.333 1959.000 NaN 1963.5 74.66667 73.66667 74.00000 NaN 77.00000 3: 2 7, 8, 9 2229 2410.333 1962.667 NaN NaN 74.66667 73.66667 77.33333 NaN NaN 4: 3 10, 11, 12 2229 2410.333 1962.667 1964.333 1966.0 74.66667 73.66667 77.33333 72.33333 77.66667
答案 1 :(得分:0)
获得完整数据集后,请尝试以下操作:
将数据框减少15秒
首先添加一列1:nrow(df)
,我们将在此示例中使用1:1000
。
require(tidyverse)
DF <- data.frame(mean_speed = sample(40:100, 1000, replace = TRUE))
DF2 <- DF %>%
mutate(index = 1:nrow(.),
group = cut(index, c(seq(0, nrow(.), 15), nrow(.)))) %>%
group_by(group) %>%
mutate(row_num = row_number()) %>%
select(-index) %>%
spread(row_num, mean_speed)
我们最终将行切成15秒的序列。然后我们将其分组并设置行号。这将为每个组添加1:15
。然后我们想取消选择除了小组和平均值之外的所有内容。最后,我们将格式扩展到更广泛。
编辑:给出您的最新信息。我会尝试以下方法:
DF2 <- dfTotal %>%
mutate(group = cut(ID, c(seq(0, nrow(.), 15), nrow(.)))) %>%
group_by(group) %>%
select(-Distance) %>%
spread(ID, SpeedKph)
我不确定的一件事是,如果您的较大数据帧中的ID是1:1000,或者它是1:15。如果您可以为数据集提供50行,那将有所帮助。如果ID是1:15,您应该能够使用上面的代码。如果它是1:1000,那么您需要添加mutate(row_num = row_number())