如何调整此循环,以使每24列占用一列并有效地rbind()
将其移至新的数据帧?
我知道这远非正确,但是一些指导会很棒。
这个想法是,在第一个循环之后,第1到24列(含)将被传输/复制到新的(空)数据帧(new_df
)中,然后第二个循环将占用第25列- 48和rbind()
到new_df
的那些值。
原因是因为我拥有大量数据,其中每一列代表一天中的一个小时。我想提取每个一天的代码块,并以长格式表示。
我目前有这个:
rows <- 168
for(i in 1:rows){
while (rows > 0) {
day <- df[, 1:24]
df <- rbind(df, day)
rows <- rows - 1
}
}
如何制作它,使其表现出与我描述的方式相同?
一些示例数据:
df <- structure(c("2018-08-05 01:00:00", "0", " 0", " 0", " 0",
" 0", "0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 02:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 03:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 04:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 05:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 06:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 07:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 08:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 09:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 10:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 11:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 12:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 13:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 14:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 15:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 16:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 17:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 18:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 19:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 20:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 21:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 22:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-05 23:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 00:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-06 01:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 02:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-06 03:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 04:00:00",
"0", " 0", " 0", " 4", " 6", "0", "0", "0", "0", "0", "0",
"NA", NA, "2018-08-06 05:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 06:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-06 07:00:00", "0", " 0", " 0", " 0", " 0",
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 08:00:00",
"0", " 0", " 0", " 0", " 0", "0", "0", "0", "0", "0", "0",
NA, NA, "2018-08-06 09:00:00", "1", " 6", " 0", " 1", " 0",
"0", "2", "2", "0", "0", "0", NA, "0.0000000", "2018-08-06 10:00:00",
"1", "48", " 774", " 754", " 5", "1", "2", "2", "0", "0", "1",
"0.2", "0.3333333", "2018-08-06 11:00:00", "1", "13", " 322",
"1423", " 7", "1", "0", "0", "0", "0", "0", "0.142857142857143",
NA, "2018-08-06 12:00:00", "2", " 2", " 51", "1672", " 3", "1",
"2", "2", "0", "0", "0", "0.333333333333333", "0.0000000", "2018-08-06 13:00:00",
"2", "60", "1377", " 324", "10", "3", "3", "3", "0", "0", "0",
"0.3", "0.0000000", "2018-08-06 14:00:00", "2", "51", "1009",
" 478", " 3", "1", "0", "0", "0", "0", "0", "0.333333333333333",
NA, "2018-08-06 15:00:00", "4", "60", "1196", " 292", " 7", "0",
"1", "1", "0", "0", "0", "NA", "0.0000000", "2018-08-06 16:00:00",
"3", "60", "1329", " 378", "15", "1", "0", "0", "0", "0", "0",
"0.0666666666666667", NA, "2018-08-06 17:00:00", "2", "22", " 481",
" 995", " 8", "2", "3", "3", "0", "0", "0", "0.25", "0.0000000",
"2018-08-06 18:00:00", "1", "28", " 391", " 789", " 5", "2",
"2", "2", "0", "0", "0", "0.4", "0.0000000", "2018-08-06 19:00:00",
"1", "60", "1169", " 301", " 8", "0", "0", "0", "0", "0", "0",
"NA", NA, "2018-08-06 20:00:00", "1", "60", "2442", " 421", "33",
"1", "0", "0", "0", "0", "1", "0.0303030303030303", "1.0000000",
"2018-08-06 21:00:00", "1", " 1", " 9", "2474", " 0", "0",
"0", "0", "0", "0", "0", NA, NA, "2018-08-06 22:00:00", "0",
" 0", " 0", "2353", " 1", "0", "0", "0", "0", "0", "0", "NA",
NA, "2018-08-06 23:00:00", "0", " 0", " 0", "1430", " 0", "0",
"0", "0", "0", "0", "0", NA, NA, "2018-08-07 00:00:00", "0",
" 0", " 0", "1019", " 0", "0", "0", "0", "0", "0", "0", NA,
NA, "2018-08-07 01:00:00", "0", " 0", " 0", " 805", " 0", "0",
"0", "0", "0", "0", "0", NA, NA, "2018-08-07 02:00:00", "0",
" 0", " 0", " 673", " 0", "0", "0", "0", "0", "0", "0", NA,
NA), .Dim = c(14L, 50L), .Dimnames = list(c("hour", "associate_count",
"minutes_covered", "plugin_loads", "plugin_unloads", "plugin_opens",
"chats_started", "claimed_chats", "completed_chats", "sales.number_of_orders",
"sales.subtotal", "missed_chats", "pct_resulting_in_chat", "missed_pct"
), c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11",
"12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22",
"23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33",
"34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44",
"45", "46", "47", "48", "49", "50")))
答案 0 :(得分:0)
如果您想获取长格式且每天的数据,我建议转置数据框,然后使用聚合函数
new_df <- as.data.frame(t(df))
#create the date column and only retain the date part
new_df$date <- format(as.Date(as.character(new_df$hour), format = "%Y-%m-%d %H:%M:%S"), format = "%Y-%m-%d")
#aggregation
library(dplyr)
agg_df <- new_df %>% group_by(date) %>% summarise(Tot_associate_count = sum(associate_count))
以类似的方式添加其余列
答案 1 :(得分:0)
一种方法:
#get number of expected 24-col dataframes
N <- ncol(df)/24
# create empty list to store these dataframes
mylist <- list(length = N)
#split data by groups of 24 columns
for(i in 1:N) {
mylist[i] <- df[ , ((i-1)*24+1):(i*24)]
}
# rbind all 24-col dataframes together
do.call(rbind, mylist)
答案 2 :(得分:0)
要扩展@Lynbakr的观点,
您可以获得长格式的所有内容:
library(tidyr)
library(dplyr)
df2 <- data.frame(t(df), stringsAsFactors = FALSE) %>%
gather(measurement_type, measurement, -hour) %>%
mutate_at(vars(measurement), as.integer)
> head(df2)
hour measurement_type measurement
1 2018-08-05 01:00:00 associate_count 0
2 2018-08-05 02:00:00 associate_count 0
3 2018-08-05 03:00:00 associate_count 0
4 2018-08-05 04:00:00 associate_count 0
5 2018-08-05 05:00:00 associate_count 0
6 2018-08-05 06:00:00 associate_count 0
如果您真的想将它们分别分成不同的data.frame
,则返回它们的列表:
df2 %>%
split(lubridate::day(.$hour))
这是假设所有度量均为数字。如果您希望每个度量都作为其自己的列,或者它们也不都是数字的(例如,有一个类别变量),则可以放弃gather
调用。