如何才能从一个数据框中获取每24列并将其添加到另一个数据框中的底部?

时间:2018-08-15 16:30:57

标签: r

如何调整此循环,以使每24列占用一列并有效地rbind()将其移至新的数据帧?

我知道这远非正确,但是一些指导会很棒。

这个想法是,在第一个循环之后,第1到24列(含)将被传输/复制到新的(空)数据帧(new_df)中,然后第二个循环将占用第25列- 48和rbind()new_df的那些值。

原因是因为我拥有大量数据,其中每一列代表一天中的一个小时。我想提取每个一天的代码块,并以长格式表示。

我目前有这个:

rows <- 168

for(i in 1:rows){
  while (rows > 0) {
    day <- df[, 1:24]
    df <- rbind(df, day)
    rows <- rows - 1
  }
}

如何制作它,使其表现出与我描述的方式相同?

一些示例数据:

df <- structure(c("2018-08-05 01:00:00", "0", " 0", "   0", "   0", 
" 0", "0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 02:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 03:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 04:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 05:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 06:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 07:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 08:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 09:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 10:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 11:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 12:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 13:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 14:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 15:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 16:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 17:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 18:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 19:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 20:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 21:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-05 22:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-05 23:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 00:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-06 01:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 02:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-06 03:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 04:00:00", 
"0", " 0", "   0", "   4", " 6", "0", "0", "0", "0", "0", "0", 
"NA", NA, "2018-08-06 05:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 06:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-06 07:00:00", "0", " 0", "   0", "   0", " 0", 
"0", "0", "0", "0", "0", "0", NA, NA, "2018-08-06 08:00:00", 
"0", " 0", "   0", "   0", " 0", "0", "0", "0", "0", "0", "0", 
NA, NA, "2018-08-06 09:00:00", "1", " 6", "   0", "   1", " 0", 
"0", "2", "2", "0", "0", "0", NA, "0.0000000", "2018-08-06 10:00:00", 
"1", "48", " 774", " 754", " 5", "1", "2", "2", "0", "0", "1", 
"0.2", "0.3333333", "2018-08-06 11:00:00", "1", "13", " 322", 
"1423", " 7", "1", "0", "0", "0", "0", "0", "0.142857142857143", 
NA, "2018-08-06 12:00:00", "2", " 2", "  51", "1672", " 3", "1", 
"2", "2", "0", "0", "0", "0.333333333333333", "0.0000000", "2018-08-06 13:00:00", 
"2", "60", "1377", " 324", "10", "3", "3", "3", "0", "0", "0", 
"0.3", "0.0000000", "2018-08-06 14:00:00", "2", "51", "1009", 
" 478", " 3", "1", "0", "0", "0", "0", "0", "0.333333333333333", 
NA, "2018-08-06 15:00:00", "4", "60", "1196", " 292", " 7", "0", 
"1", "1", "0", "0", "0", "NA", "0.0000000", "2018-08-06 16:00:00", 
"3", "60", "1329", " 378", "15", "1", "0", "0", "0", "0", "0", 
"0.0666666666666667", NA, "2018-08-06 17:00:00", "2", "22", " 481", 
" 995", " 8", "2", "3", "3", "0", "0", "0", "0.25", "0.0000000", 
"2018-08-06 18:00:00", "1", "28", " 391", " 789", " 5", "2", 
"2", "2", "0", "0", "0", "0.4", "0.0000000", "2018-08-06 19:00:00", 
"1", "60", "1169", " 301", " 8", "0", "0", "0", "0", "0", "0", 
"NA", NA, "2018-08-06 20:00:00", "1", "60", "2442", " 421", "33", 
"1", "0", "0", "0", "0", "1", "0.0303030303030303", "1.0000000", 
"2018-08-06 21:00:00", "1", " 1", "   9", "2474", " 0", "0", 
"0", "0", "0", "0", "0", NA, NA, "2018-08-06 22:00:00", "0", 
" 0", "   0", "2353", " 1", "0", "0", "0", "0", "0", "0", "NA", 
NA, "2018-08-06 23:00:00", "0", " 0", "   0", "1430", " 0", "0", 
"0", "0", "0", "0", "0", NA, NA, "2018-08-07 00:00:00", "0", 
" 0", "   0", "1019", " 0", "0", "0", "0", "0", "0", "0", NA, 
NA, "2018-08-07 01:00:00", "0", " 0", "   0", " 805", " 0", "0", 
"0", "0", "0", "0", "0", NA, NA, "2018-08-07 02:00:00", "0", 
" 0", "   0", " 673", " 0", "0", "0", "0", "0", "0", "0", NA, 
NA), .Dim = c(14L, 50L), .Dimnames = list(c("hour", "associate_count", 
"minutes_covered", "plugin_loads", "plugin_unloads", "plugin_opens", 
"chats_started", "claimed_chats", "completed_chats", "sales.number_of_orders", 
"sales.subtotal", "missed_chats", "pct_resulting_in_chat", "missed_pct"
), c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", 
"12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", 
"23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", 
"34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", 
"45", "46", "47", "48", "49", "50")))

3 个答案:

答案 0 :(得分:0)

如果您想获取长格式且每天的数据,我建议转置数据框,然后使用聚合函数

new_df <- as.data.frame(t(df))  

#create the date column and only retain the date part
new_df$date <- format(as.Date(as.character(new_df$hour), format = "%Y-%m-%d %H:%M:%S"), format = "%Y-%m-%d")

#aggregation
library(dplyr)
agg_df <- new_df %>% group_by(date) %>% summarise(Tot_associate_count = sum(associate_count))

以类似的方式添加其余列

答案 1 :(得分:0)

一种方法:

#get number of expected 24-col dataframes
N <- ncol(df)/24

# create empty list to store these dataframes
mylist <- list(length = N)

#split data by groups of 24 columns
for(i in 1:N) {
    mylist[i] <- df[ , ((i-1)*24+1):(i*24)]
}

# rbind all 24-col dataframes together
do.call(rbind, mylist)

答案 2 :(得分:0)

要扩展@Lynbakr的观点,

您可以获得长格式的所有内容:

library(tidyr)
library(dplyr)

df2 <- data.frame(t(df), stringsAsFactors = FALSE) %>% 
  gather(measurement_type, measurement, -hour) %>%
  mutate_at(vars(measurement), as.integer)

> head(df2)
                 hour measurement_type measurement
1 2018-08-05 01:00:00  associate_count           0
2 2018-08-05 02:00:00  associate_count           0
3 2018-08-05 03:00:00  associate_count           0
4 2018-08-05 04:00:00  associate_count           0
5 2018-08-05 05:00:00  associate_count           0
6 2018-08-05 06:00:00  associate_count           0

如果您真的想将它们分别分成不同的data.frame,则返回它们的列表:

df2 %>% 
  split(lubridate::day(.$hour))

这是假设所有度量均为数字。如果您希望每个度量都作为其自己的列,或者它们也不都是数字的(例如,有一个类别变量),则可以放弃gather调用。