我遇到与时序差异有关的问题,我试图通过dplyr
解决。我的初始数据框如下所示:
Paper <- data.frame(
Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"),
Dates = c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"),
Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"),
Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final")
)
或
Student Dates Time Connection
A 2014-04-17 10:35:00 Initial
A 2014-04-17 11:25:00 Final
A 2014-04-17 19:15:00 Initial
A 2014-04-17 21:00:00 Final
A 2014-04-18 22:00:00 Initial
A 2014-04-18 22:21:26 Final
B 2014-04-18 10:25:00 Initial
B 2014-04-18 11:15:00 Final
B 2014-04-18 16:05:00 Initial
B 2014-04-18 17:25:00 Final
考虑到Date
和Student
"Initial"
之间计算的实际时间"Final"
,我试图了解每个Connection
专用的时间 Student Dates Time (Minutes)
A 14-04-17 155
A 14-04-18 21.43
B 14-04-18 130
所以我期望的数据框架如下所示:
"Initial"
我试过这个,我几乎得到了解决方案,但我不知道如何考虑计算连接之间的时间差("Final"
/ Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d")
Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time),
format = "%H:%M:%S"))
FinalPaper <-
Paper %>%
group_by(Student, Dates) %>%
summarise(TimeSpent = sum(diff(Time))) %>%
mutate(TimeSpent = TimeSpent/60) %>%
mutate(TimeSpent = round(TimeSpent, digits = 2))
),所以我得到了这个:
Student Dates TimeSpent
1 A 2014-04-17 625.00
2 A 2014-04-18 21.43
3 B 2014-04-18 420.00
所得
TimeSpent
从10:35:00
可以看出,时间越长,这是因为我没有考虑连接,所以计算错误的时间。例如,对于学生A,它正在计算21:00:00
和@if (@CodeSection == @Batch) @then
@echo off
rem Use %SendKeys% to send keys to the keyboard buffer
set SendKeys=CScript //nologo //E:JScript "%~F0"
rem Start the other program in the same Window
start "" /B cmd
%SendKeys% "echo off{ENTER}"
set /P "=Wait and send a command: " < NUL
ping -n 5 -w 1 127.0.0.1 > NUL
%SendKeys% "echo Hello, world!{ENTER}"
set /P "=Wait and send an Up Arrow key: [" < NUL
ping -n 5 -w 1 127.0.0.1 > NUL
%SendKeys% "{UP}"
set /P "=] Wait and send an Enter key:" < NUL
ping -n 5 -w 1 127.0.0.1 > NUL
%SendKeys% "{ENTER}"
%SendKeys% "exit{ENTER}"
goto :EOF
@end
// JScript section
var WshShell = WScript.CreateObject("WScript.Shell");
WshShell.SendKeys(WScript.Arguments(0));
之间的错误时间。
非常感谢!!
答案 0 :(得分:3)
您可以使用Paper <- data.frame(
Student = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B"),
Dates= c("2014-04-17", "2014-04-17", "2014-04-17", "2014-04-17", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18", "2014-04-18","2014-04-18"),
Time = c("10:35:00", "11:25:00", "19:15:00", "21:00:00", "22:00:00", "22:21:26", "10:25:00", "11:15:00", "16:05:00", "17:25:00"),
Connection = c("Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final", "Initial", "Final")
)
Paper$Dates <- as.Date(Paper$Dates, "%Y-%m-%d")
Paper$Time <- as.numeric(as.POSIXct(as.character(Paper$Time),
format = "%H:%M:%S"))
FinalPaper <- Paper %>%
mutate(seqid = cumsum(Connection == "Initial")) %>%
group_by(Student, Dates, seqid) %>%
summarise(TimeSpent = sum(diff(Time))) %>%
group_by(Student, Dates) %>%
summarise(TimeSpent = round(sum(TimeSpent)/60,2))
为每个&#39;会话添加ID。先决条件是数据按照您在此处显示的方式进行排序。然后我们可以计算每个会话的时差,并再次聚合以获得每个学生每个日期所花费的总时间:
# A tibble: 3 x 3
# Groups: Student [2]
Student Dates TimeSpent
<fctr> <date> <dbl>
1 A 2014-04-17 155.00
2 A 2014-04-18 21.43
3 B 2014-04-18 130.00
输出:
{{1}}
希望这有帮助!
答案 1 :(得分:3)
这是基于data.table
的解决方案:
library(data.table)
setDT(Paper)
Paper[order(Student, Time), .(
TimeSpend = sum(c(0,diff(Time))[Connection == "Final"])/60
), by = .(Student, Dates)]
Student Dates TimeSpend
1: A 2014-04-17 155.00000
2: A 2014-04-18 21.43333
3: B 2014-04-18 130.00000