获取数据集的difftime

时间:2013-02-13 07:49:40

标签: r

根据diff或difftime我有一个问题。

Equip <- c(1001,1001,1001,1002,1002,1002,1003,1003,1003,1003,1003,1003,1003,1003)
Notif <- c(321,322,322,319,319,345,495,495,495,441,441,441,471,471)
Job <- c("01.01.2011","05.01.2011","05.01.2011","05.01.2011","05.01.2011",
"15.01.2011","23.03.2011","23.03.2011","23.03.2011","27.03.2011","27.03.2011",
"27.03.2011","29.03.2011",
"29.03.2011")
Job <- as.Date(Job,format="%d. %m. %Y")
df <- data.frame(Equip,Notif,Job)

我想在data.frame中有一个新列,其中时间差[以天为单位]应该是。

计算时差的条件如下: 我现在要做的是,如果Equipnumber是相同的,但Notifnumber是不同的,我希望有时差(Jobdate)

输出应该是这样的:

df$dd <- c(0,4,4,0,0,10,0,0,0,4,4,4,2,2)

(对于Equipnumber中的第一个Notifnumber,dd为0,因为这是第一次访问)

希望你能帮助我,我尝试这样做,但我不能像我想要的那样去做。

我只能使用没有任何软件包的标准R程序......

根据给定的链接,我创建以下示例,但也不起作用:

也许你可以帮助我:

Equips <- c(10006250,10006252,10006252,10006252,10006252,10006252,10006252,
10006252,10006252,10006252,10006252,10006252,10006252,10006252,10006252,10006777)
Notifs <- c(306863771,306862774,306862774,306862774,306933440,
306933440,306998451,306998451,307024311,307024311,
307024311,307024311,307033136,307033136,307128754,307158697)
Jobs <- c("25.01.2011","23.06.2011","23.06.2011","23.06.2011","28.06.2011",
"28.06.2011","02.07.2011","02.07.2011","03.09.2011","03.09.2011",
"03.09.2011","03.09.2011","05.09.2011","05.09.2011","02.11.2011","05.05.2011")
Comps <- c("Service Boiler","General Boiler Components","Ignition and Flame Detection",
"Service Boiler!!!","Electrical Components","Gas Train Assembly",
"Control Box"," Ignition and Flame Detection","CH Components Active",
"CH Components Passive","CH Components Passive","DHW Components",
"DHW Components","Internal Pipeworks and Connections","not grouped in WCC",
"Service Boiler")
Category <- c("service_repair","service_repair","service_repair",
"service_repair","repair","repair","repair","repair","repair","repair",
"repair","repair","repair","repair","repair","service_repair")
Job <- as.Date(Job,format="%d. %m. %Y")
df <- data.frame(Equips,Notifs,Jobs,Comps,Category)

我真的不知道它为什么不能用这个,但是根据第一篇文章中的数据, 也许你能够帮助我。

1 个答案:

答案 0 :(得分:3)

使用基础软件包有点冗长且可能是错综复杂的答案。对plyr有更好了解的人可能会提供更优雅的解决方案。

> df
   Equip Notif        Job
1   1001   321 2011-01-01
2   1001   322 2011-01-05
3   1001   322 2011-01-05
4   1002   319 2011-01-05
5   1002   319 2011-01-05
6   1002   345 2011-01-15
7   1003   495 2011-03-23
8   1003   495 2011-03-23
9   1003   495 2011-03-23
10  1003   441 2011-03-27
11  1003   441 2011-03-27
12  1003   441 2011-03-27
13  1003   471 2011-03-29
14  1003   471 2011-03-29

首先在没有任何条件的日期获得diff

> df$diff <- c(0,diff(df$Job))
> df
   Equip Notif        Job diff
1   1001   321 2011-01-01    0
2   1001   322 2011-01-05    4
3   1001   322 2011-01-05    0
4   1002   319 2011-01-05    0
5   1002   319 2011-01-05    0
6   1002   345 2011-01-15   10
7   1003   495 2011-03-23   67
8   1003   495 2011-03-23    0
9   1003   495 2011-03-23    0
10  1003   441 2011-03-27    4
11  1003   441 2011-03-27    0
12  1003   441 2011-03-27    0
13  1003   471 2011-03-29    2
14  1003   471 2011-03-29    0

创建新列diff11条件为真,0为假

> df$diff1 <- c(0, ifelse(diff(df$Equip) == 0 & diff(df$Notif) != 0, 1, 0))
> df
   Equip Notif        Job diff diff1
1   1001   321 2011-01-01    0     0
2   1001   322 2011-01-05    4     1
3   1001   322 2011-01-05    0     0
4   1002   319 2011-01-05    0     0
5   1002   319 2011-01-05    0     0
6   1002   345 2011-01-15   10     1
7   1003   495 2011-03-23   67     0
8   1003   495 2011-03-23    0     0
9   1003   495 2011-03-23    0     0
10  1003   441 2011-03-27    4     1
11  1003   441 2011-03-27    0     0
12  1003   441 2011-03-27    0     0
13  1003   471 2011-03-29    2     1
14  1003   471 2011-03-29    0     0

仅当条件为真时,将结果乘以得到diff列的值

> df$diff <- df$diff * df$diff1
> df$diff1 <- NULL
> df
   Equip Notif        Job diff
1   1001   321 2011-01-01    0
2   1001   322 2011-01-05    4
3   1001   322 2011-01-05    0
4   1002   319 2011-01-05    0
5   1002   319 2011-01-05    0
6   1002   345 2011-01-15   10
7   1003   495 2011-03-23    0
8   1003   495 2011-03-23    0
9   1003   495 2011-03-23    0
10  1003   441 2011-03-27    4
11  1003   441 2011-03-27    0
12  1003   441 2011-03-27    0
13  1003   471 2011-03-29    2
14  1003   471 2011-03-29    0

如果重复读数,则将数据与自身合并以重复值。 (尽管如果数据集中有其他列,则可能需要更改此步骤)

> res <- merge(df[,1:3], df[df$diff!=0,], all.x=T)
> res
   Equip Notif        Job diff
1   1001   321 2011-01-01   NA
2   1001   322 2011-01-05    4
3   1001   322 2011-01-05    4
4   1002   319 2011-01-05   NA
5   1002   319 2011-01-05   NA
6   1002   345 2011-01-15   10
7   1003   441 2011-03-27    4
8   1003   441 2011-03-27    4
9   1003   441 2011-03-27    4
10  1003   471 2011-03-29    2
11  1003   471 2011-03-29    2
12  1003   495 2011-03-23   NA
13  1003   495 2011-03-23   NA
14  1003   495 2011-03-23   NA

将NA替换为0

> res[is.na(res)] <- 0
> res
   Equip Notif        Job diff
1   1001   321 2011-01-01    0
2   1001   322 2011-01-05    4
3   1001   322 2011-01-05    4
4   1002   319 2011-01-05    0
5   1002   319 2011-01-05    0
6   1002   345 2011-01-15   10
7   1003   441 2011-03-27    4
8   1003   441 2011-03-27    4
9   1003   441 2011-03-27    4
10  1003   471 2011-03-29    2
11  1003   471 2011-03-29    2
12  1003   495 2011-03-23    0
13  1003   495 2011-03-23    0
14  1003   495 2011-03-23    0

对于包含更多列的第二个示例数据,请使用

替换2个步骤
res <- merge(df[,c('Equip', 'Notif', 'Job', 'Comps', 'Category')], df[ df$diff !=0    ,c('Equip', 'Notif', 'Job', 'diff')], all.x=T)
res[is.na(res)] <- 0
res
      Equip     Notif        Job                              Comps       Category diff
1  10006250 306863771 2011-01-25                     Service Boiler service_repair    0
2  10006252 306862774 2011-06-23          General Boiler Components service_repair    0
3  10006252 306862774 2011-06-23       Ignition and Flame Detection service_repair    0
4  10006252 306862774 2011-06-23                  Service Boiler!!! service_repair    0
5  10006252 306933440 2011-06-28              Electrical Components         repair    5
6  10006252 306933440 2011-06-28                 Gas Train Assembly         repair    5
7  10006252 306998451 2011-07-02                        Control Box         repair    4
8  10006252 306998451 2011-07-02       Ignition and Flame Detection         repair    4
9  10006252 307024311 2011-09-03               CH Components Active         repair   63
10 10006252 307024311 2011-09-03              CH Components Passive         repair   63
11 10006252 307024311 2011-09-03              CH Components Passive         repair   63
12 10006252 307024311 2011-09-03                     DHW Components         repair   63
13 10006252 307033136 2011-09-05                     DHW Components         repair    2
14 10006252 307033136 2011-09-05 Internal Pipeworks and Connections         repair    2
15 10006252 307128754 2011-11-02                 not grouped in WCC         repair   58
16 10006777 307158697 2011-05-05                     Service Boiler service_repair    0