如何在群组之间制定时间差异

时间:2018-01-28 11:44:37

标签: r datetime dataframe dplyr

我遇到与时序差异有关的问题,我试图通过dplyr解决。我的初始数据框如下所示:

Paper <- data.frame(
  Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"), 
  Dates = c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"),
  Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"),
  Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final")
)

   Student      Dates     Time  Connection
       A    2014-04-17  10:35:00    Initial
       A    2014-04-17  11:25:00      Final
       A    2014-04-17  19:15:00    Initial
       A    2014-04-17  21:00:00      Final
       A    2014-04-18  22:00:00    Initial
       A    2014-04-18  22:21:26      Final
       B    2014-04-18  10:25:00    Initial
       B    2014-04-18  11:15:00      Final
       B    2014-04-18  16:05:00    Initial
       B    2014-04-18  17:25:00      Final

考虑到DateStudent "Initial"之间计算的实际时间"Final",我试图了解每个Connection专用的时间 Student Dates Time (Minutes) A 14-04-17 155 A 14-04-18 21.43 B 14-04-18 130

所以我期望的数据框架如下所示:

"Initial"

我试过这个,我几乎得到了解决方案,但我不知道如何考虑计算连接之间的时间差("Final" / Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d") Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time), format = "%H:%M:%S")) FinalPaper <- Paper %>% group_by(Student, Dates) %>% summarise(TimeSpent = sum(diff(Time))) %>% mutate(TimeSpent = TimeSpent/60) %>% mutate(TimeSpent = round(TimeSpent, digits = 2)) ),所以我得到了这个:

  Student      Dates   TimeSpent
1       A   2014-04-17    625.00
2       A   2014-04-18     21.43
3       B   2014-04-18    420.00

所得

TimeSpent

10:35:00可以看出,时间越长,这是因为我没有考虑连接,所以计算错误的时间。例如,对于学生A,它正在计算21:00:00@if (@CodeSection == @Batch) @then @echo off rem Use %SendKeys% to send keys to the keyboard buffer set SendKeys=CScript //nologo //E:JScript "%~F0" rem Start the other program in the same Window start "" /B cmd %SendKeys% "echo off{ENTER}" set /P "=Wait and send a command: " < NUL ping -n 5 -w 1 127.0.0.1 > NUL %SendKeys% "echo Hello, world!{ENTER}" set /P "=Wait and send an Up Arrow key: [" < NUL ping -n 5 -w 1 127.0.0.1 > NUL %SendKeys% "{UP}" set /P "=] Wait and send an Enter key:" < NUL ping -n 5 -w 1 127.0.0.1 > NUL %SendKeys% "{ENTER}" %SendKeys% "exit{ENTER}" goto :EOF @end // JScript section var WshShell = WScript.CreateObject("WScript.Shell"); WshShell.SendKeys(WScript.Arguments(0)); 之间的错误时间。

非常感谢!!

2 个答案:

答案 0 :(得分:3)

您可以使用Paper <- data.frame( Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"), Dates= c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"), Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"), Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final") ) Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d") Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time), format = "%H:%M:%S")) FinalPaper <- Paper %>% mutate(seqid = cumsum(Connection == "Initial")) %>% group_by(Student, Dates, seqid) %>% summarise(TimeSpent = sum(diff(Time))) %>% group_by(Student, Dates) %>% summarise(TimeSpent = round(sum(TimeSpent)/60,2)) 为每个&#39;会话添加ID。先决条件是数据按照您在此处显示的方式进行排序。然后我们可以计算每个会话的时差,并再次聚合以获得每个学生每个日期所花费的总时间:

# A tibble: 3 x 3
# Groups:   Student [2]
  Student      Dates TimeSpent
   <fctr>     <date>     <dbl>
1       A 2014-04-17    155.00
2       A 2014-04-18     21.43
3       B 2014-04-18    130.00

输出:

{{1}}

希望这有帮助!

答案 1 :(得分:3)

这是基于data.table的解决方案:

library(data.table)
setDT(Paper)
Paper[order(Student, Time), .(
    TimeSpend = sum(c(0,diff(Time))[Connection == "Final"])/60
  ), by = .(Student, Dates)]

   Student      Dates TimeSpend
1:       A 2014-04-17 155.00000
2:       A 2014-04-18  21.43333
3:       B 2014-04-18 130.00000