根据diff或difftime我有一个问题。
Equip <- c(1001,1001,1001,1002,1002,1002,1003,1003,1003,1003,1003,1003,1003,1003)
Notif <- c(321,322,322,319,319,345,495,495,495,441,441,441,471,471)
Job <- c("01.01.2011","05.01.2011","05.01.2011","05.01.2011","05.01.2011",
"15.01.2011","23.03.2011","23.03.2011","23.03.2011","27.03.2011","27.03.2011",
"27.03.2011","29.03.2011",
"29.03.2011")
Job <- as.Date(Job,format="%d. %m. %Y")
df <- data.frame(Equip,Notif,Job)
我想在data.frame中有一个新列,其中时间差[以天为单位]应该是。
计算时差的条件如下: 我现在要做的是,如果Equipnumber是相同的,但Notifnumber是不同的,我希望有时差(Jobdate)
输出应该是这样的:
df$dd <- c(0,4,4,0,0,10,0,0,0,4,4,4,2,2)
(对于Equipnumber中的第一个Notifnumber,dd为0,因为这是第一次访问)
希望你能帮助我,我尝试这样做,但我不能像我想要的那样去做。
我只能使用没有任何软件包的标准R程序......
根据给定的链接,我创建以下示例,但也不起作用:
也许你可以帮助我:
Equips <- c(10006250,10006252,10006252,10006252,10006252,10006252,10006252,
10006252,10006252,10006252,10006252,10006252,10006252,10006252,10006252,10006777)
Notifs <- c(306863771,306862774,306862774,306862774,306933440,
306933440,306998451,306998451,307024311,307024311,
307024311,307024311,307033136,307033136,307128754,307158697)
Jobs <- c("25.01.2011","23.06.2011","23.06.2011","23.06.2011","28.06.2011",
"28.06.2011","02.07.2011","02.07.2011","03.09.2011","03.09.2011",
"03.09.2011","03.09.2011","05.09.2011","05.09.2011","02.11.2011","05.05.2011")
Comps <- c("Service Boiler","General Boiler Components","Ignition and Flame Detection",
"Service Boiler!!!","Electrical Components","Gas Train Assembly",
"Control Box"," Ignition and Flame Detection","CH Components Active",
"CH Components Passive","CH Components Passive","DHW Components",
"DHW Components","Internal Pipeworks and Connections","not grouped in WCC",
"Service Boiler")
Category <- c("service_repair","service_repair","service_repair",
"service_repair","repair","repair","repair","repair","repair","repair",
"repair","repair","repair","repair","repair","service_repair")
Job <- as.Date(Job,format="%d. %m. %Y")
df <- data.frame(Equips,Notifs,Jobs,Comps,Category)
我真的不知道它为什么不能用这个,但是根据第一篇文章中的数据, 也许你能够帮助我。
答案 0 :(得分:3)
使用基础软件包有点冗长且可能是错综复杂的答案。对plyr
有更好了解的人可能会提供更优雅的解决方案。
> df
Equip Notif Job
1 1001 321 2011-01-01
2 1001 322 2011-01-05
3 1001 322 2011-01-05
4 1002 319 2011-01-05
5 1002 319 2011-01-05
6 1002 345 2011-01-15
7 1003 495 2011-03-23
8 1003 495 2011-03-23
9 1003 495 2011-03-23
10 1003 441 2011-03-27
11 1003 441 2011-03-27
12 1003 441 2011-03-27
13 1003 471 2011-03-29
14 1003 471 2011-03-29
首先在没有任何条件的日期获得diff
> df$diff <- c(0,diff(df$Job))
> df
Equip Notif Job diff
1 1001 321 2011-01-01 0
2 1001 322 2011-01-05 4
3 1001 322 2011-01-05 0
4 1002 319 2011-01-05 0
5 1002 319 2011-01-05 0
6 1002 345 2011-01-15 10
7 1003 495 2011-03-23 67
8 1003 495 2011-03-23 0
9 1003 495 2011-03-23 0
10 1003 441 2011-03-27 4
11 1003 441 2011-03-27 0
12 1003 441 2011-03-27 0
13 1003 471 2011-03-29 2
14 1003 471 2011-03-29 0
创建新列diff1
,1
条件为真,0
为假
> df$diff1 <- c(0, ifelse(diff(df$Equip) == 0 & diff(df$Notif) != 0, 1, 0))
> df
Equip Notif Job diff diff1
1 1001 321 2011-01-01 0 0
2 1001 322 2011-01-05 4 1
3 1001 322 2011-01-05 0 0
4 1002 319 2011-01-05 0 0
5 1002 319 2011-01-05 0 0
6 1002 345 2011-01-15 10 1
7 1003 495 2011-03-23 67 0
8 1003 495 2011-03-23 0 0
9 1003 495 2011-03-23 0 0
10 1003 441 2011-03-27 4 1
11 1003 441 2011-03-27 0 0
12 1003 441 2011-03-27 0 0
13 1003 471 2011-03-29 2 1
14 1003 471 2011-03-29 0 0
仅当条件为真时,将结果乘以得到diff列的值
> df$diff <- df$diff * df$diff1
> df$diff1 <- NULL
> df
Equip Notif Job diff
1 1001 321 2011-01-01 0
2 1001 322 2011-01-05 4
3 1001 322 2011-01-05 0
4 1002 319 2011-01-05 0
5 1002 319 2011-01-05 0
6 1002 345 2011-01-15 10
7 1003 495 2011-03-23 0
8 1003 495 2011-03-23 0
9 1003 495 2011-03-23 0
10 1003 441 2011-03-27 4
11 1003 441 2011-03-27 0
12 1003 441 2011-03-27 0
13 1003 471 2011-03-29 2
14 1003 471 2011-03-29 0
如果重复读数,则将数据与自身合并以重复值。 (尽管如果数据集中有其他列,则可能需要更改此步骤)
> res <- merge(df[,1:3], df[df$diff!=0,], all.x=T)
> res
Equip Notif Job diff
1 1001 321 2011-01-01 NA
2 1001 322 2011-01-05 4
3 1001 322 2011-01-05 4
4 1002 319 2011-01-05 NA
5 1002 319 2011-01-05 NA
6 1002 345 2011-01-15 10
7 1003 441 2011-03-27 4
8 1003 441 2011-03-27 4
9 1003 441 2011-03-27 4
10 1003 471 2011-03-29 2
11 1003 471 2011-03-29 2
12 1003 495 2011-03-23 NA
13 1003 495 2011-03-23 NA
14 1003 495 2011-03-23 NA
将NA替换为0
> res[is.na(res)] <- 0
> res
Equip Notif Job diff
1 1001 321 2011-01-01 0
2 1001 322 2011-01-05 4
3 1001 322 2011-01-05 4
4 1002 319 2011-01-05 0
5 1002 319 2011-01-05 0
6 1002 345 2011-01-15 10
7 1003 441 2011-03-27 4
8 1003 441 2011-03-27 4
9 1003 441 2011-03-27 4
10 1003 471 2011-03-29 2
11 1003 471 2011-03-29 2
12 1003 495 2011-03-23 0
13 1003 495 2011-03-23 0
14 1003 495 2011-03-23 0
对于包含更多列的第二个示例数据,请使用
替换2个步骤res <- merge(df[,c('Equip', 'Notif', 'Job', 'Comps', 'Category')], df[ df$diff !=0 ,c('Equip', 'Notif', 'Job', 'diff')], all.x=T)
res[is.na(res)] <- 0
res
Equip Notif Job Comps Category diff
1 10006250 306863771 2011-01-25 Service Boiler service_repair 0
2 10006252 306862774 2011-06-23 General Boiler Components service_repair 0
3 10006252 306862774 2011-06-23 Ignition and Flame Detection service_repair 0
4 10006252 306862774 2011-06-23 Service Boiler!!! service_repair 0
5 10006252 306933440 2011-06-28 Electrical Components repair 5
6 10006252 306933440 2011-06-28 Gas Train Assembly repair 5
7 10006252 306998451 2011-07-02 Control Box repair 4
8 10006252 306998451 2011-07-02 Ignition and Flame Detection repair 4
9 10006252 307024311 2011-09-03 CH Components Active repair 63
10 10006252 307024311 2011-09-03 CH Components Passive repair 63
11 10006252 307024311 2011-09-03 CH Components Passive repair 63
12 10006252 307024311 2011-09-03 DHW Components repair 63
13 10006252 307033136 2011-09-05 DHW Components repair 2
14 10006252 307033136 2011-09-05 Internal Pipeworks and Connections repair 2
15 10006252 307128754 2011-11-02 not grouped in WCC repair 58
16 10006777 307158697 2011-05-05 Service Boiler service_repair 0