我有关于实验室中每台机器的数据集,
MachineID InstalledDate SwitchedOnDate Status
1 2010-02-18 2010-02-19 SleepMode
1 2010-02-18 2010-02-20 Active
1 2010-02-18 2010-02-21 SleepMode
1 2010-02-18 2010-02-22 Active
2 2010-02-20 2010-02-21 Active
2 2010-02-20 2010-02-22 SleepMode
3 2010-02-10 2010-02-18 SleepMode
4 2010-03-10 2010-03-15 SleepMode
我试图找出每台机器首次从安装日期开始运行所需的天数。这就是" SwitchedOnDate - InstalledDate"。
答案 0 :(得分:3)
在data.table
中,基本上是相同的想法:
library(data.table)
setDT(df) #convert to data.table
df[, SwitchedOnDate[which.max(Status == "Active")] - min(SwitchedonDate),
by = MachineID]
如果您的输出中有一个名称(例如OffDuration
),则会略有语法更改:
df[Status == "Active",
.(OffDuration =
SwitchedOnDate[which.max(Status == "Active")] - min(SwitchedonDate)),
by = MachineID]
答案 1 :(得分:2)
根据@ Gregor&@ Frank的评论,更好的方法是使用distinct
仅保留每个MachineID
的(第一个)唯一行,而不是按MachineID
:
library(dplyr)
res <- df %>% filter(Status=="Active") %>%
distinct(MachineID, .keep_all=TRUE) %>%
mutate(Days.Go.Active=difftime(SwitchedOnDate,InstalledDate,units="days"))
print(res)
##Source: local data frame [2 x 5]
##Groups: MachineID [2]
##
## MachineID InstalledDate SwitchedOnDate Status Days.Go.Active
## <int> <date> <date> <chr> <S3: difftime>
##1 1 2010-02-18 2010-02-20 Active 2 days
##2 2 2010-02-20 2010-02-21 Active 1 days
使用dplyr
,您可以mutate
使用difftime
来计算"days"
单位的差异:
library(dplyr)
res <- df %>% group_by(MachineID) %>%
filter(Status=="Active") %>%
filter(row_number()==1) %>%
mutate(Days.Go.Active=difftime(SwitchedOnDate,InstalledDate,units="days"))
print(res)
##Source: local data frame [2 x 5]
##Groups: MachineID [2]
##
## MachineID InstalledDate SwitchedOnDate Status Days.Go.Active
## <int> <date> <date> <chr> <S3: difftime>
##1 1 2010-02-18 2010-02-20 Active 2 days
##2 2 2010-02-20 2010-02-21 Active 1 days
在这里,我们group_by
MachineID
然后使用filter
仅保留每个Status
Active
df <- structure(list(MachineID = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L),
InstalledDate = structure(c(14658, 14658, 14658, 14658, 14660,
14660, 14650, 14678), class = "Date"), SwitchedOnDate = structure(c(14659,
14660, 14661, 14662, 14661, 14662, 14658, 14683), class = "Date"),
Status = c("SleepMode", "Active", "SleepMode", "Active",
"Active", "SleepMode", "SleepMode", "SleepMode")), .Names = c("MachineID",
"InstalledDate", "SwitchedOnDate", "Status"), row.names = c(NA,
-8L), class = "data.frame")
## MachineID InstalledDate SwitchedOnDate Status
##1 1 2010-02-18 2010-02-19 SleepMode
##2 1 2010-02-18 2010-02-20 Active
##3 1 2010-02-18 2010-02-21 SleepMode
##4 1 2010-02-18 2010-02-22 Active
##5 2 2010-02-20 2010-02-21 Active
##6 2 2010-02-20 2010-02-22 SleepMode
##7 3 2010-02-10 2010-02-18 SleepMode
##8 4 2010-03-10 2010-03-15 SleepMode
组的第一行。
数据:强>
res <- df %>% group_by(MachineID) %>%
mutate(FirstSwitchedOnDate=first(SwitchedOnDate)) %>%
filter(Status=="Active") %>%
filter(row_number()==1) %>%
mutate(Days.Go.Active=as.numeric(difftime(SwitchedOnDate,FirstSwitchedOnDate,units="days"))) %>%
select(-FirstSwitchedOnDate)
##Source: local data frame [2 x 5]
##Groups: MachineID [2]
##
## MachineID InstalledDate SwitchedOnDate Status Days.Go.Active
## <int> <date> <date> <chr> <dbl>
##1 1 2010-02-18 2010-02-20 Active 1
##2 2 2010-02-20 2010-02-21 Active 0
.dropdown:hover .arrow4{
-webkit-animation: spin 0.3s linear;
-moz-animation: spin 0.3s linear;
-o-animation: spin 0.3s linear;
-ms-animation: spin 0.3s linear;
animation-fill-mode: forwards;
}
@-webkit-keyframes spin {
0% { -webkit-transform: rotate(0deg); }
100% { -webkit-transform: rotate(90deg); }
}