你如何计算自R发生事故以来的岁月?我的数据如下:
Year Fatal Non-Fatal
French Airline 1989 1 1
French Airline 1990 1 0
French Airline 1991 0 0
French Airline 1992 0 1
French Airline 1993 0 0
UK Airline 1989 1 1
UK Airline 1990 0 0
UK Airline 1991 1 0
UK Airline 1992 0 0
UK Airline 1993 0 0
由航空公司分组,我希望有一个额外的列,可以追溯到最后一次致命或非致命的崩溃发生时。输出看起来像这样:
Year Fatal Non-Fatal Since Fatal Since Non-Fatal
French Airline 1989 1 1 0 0
French Airline 1990 1 0 0 1
French Airline 1991 0 0 1 2
French Airline 1992 0 1 2 0
French Airline 1993 0 0 3 1
UK Airline 1989 1 1 0 0
UK Airline 1990 0 0 1 1
UK Airline 1991 1 0 0 2
UK Airline 1992 0 0 1 3
UK Airline 1993 0 0 2 4
在一个理想的世界里,我最后还有另一个专栏可以追溯到任何类型的崩溃,致命或非致命。我该怎么做?
答案 0 :(得分:2)
使用联接并要求包的版本1.9.7+:
library(data.table)
# data borrowed from @majom's answer
d[, sf :=
d[as.logical(Fatal)][d, on=.(Airline, Year), roll=TRUE, Year-x.Year]]
d[, sn :=
d[as.logical(Non_Fatal)][d, on=.(Airline, Year), roll=TRUE, Year-x.Year]]
d[, sa :=
d[as.logical(pmax(Fatal, Non_Fatal))][d, on=.(Airline, Year), roll=TRUE, Year-x.Year]]
给出了
Airline Year Fatal Non_Fatal sf sn sa
1: French_Airline 1989 1 1 0 0 0
2: French_Airline 1990 1 0 0 1 0
3: French_Airline 1991 0 0 1 2 1
4: French_Airline 1992 0 1 2 0 0
5: French_Airline 1993 0 0 3 1 1
6: UK_Airline 1989 1 1 0 0 0
7: UK_Airline 1990 0 0 1 1 1
8: UK_Airline 1991 1 0 0 2 0
9: UK_Airline 1992 0 0 1 3 1
10: UK_Airline 1993 0 0 2 4 2
或者,对于最后一个:d[, sa := pmin(sf, sn)]
,如果您已经计算过那些。
对逻辑的强制应该暗示那些cols应该存储为逻辑......
答案 1 :(得分:1)
以下是通过航空公司致命/非致命/任何碰撞累积金额计算变量的方法:
# Load required package
library(data.table)
# Setup data
d <- fread("Airline Year Fatal Non_Fatal
French_Airline 1989 1 1
French_Airline 1990 1 0
French_Airline 1991 0 0
French_Airline 1992 0 1
French_Airline 1993 0 0
UK_Airline 1989 1 1
UK_Airline 1990 0 0
UK_Airline 1991 1 0
UK_Airline 1992 0 0
UK_Airline 1993 0 0", header=T)
# Since fatal calculation
d[ ,Cumsum.Fatal:=cumsum(Fatal), by=Airline]
d[Fatal!=1, Since.Fatal:=1:.N, by=list(Airline, Cumsum.Fatal)]
d[Fatal==1, Since.Fatal:=0]
# Since non-fatal calculation
d[ ,Cumsum.Non_Fatal:=cumsum(Non_Fatal), by=Airline]
d[Non_Fatal!=1, Since.Non_Fatal:=1:.N, by=list(Airline, Cumsum.Non_Fatal)]
d[Non_Fatal==1, Since.Non_Fatal:=0]
# Since any (fatal or non-fatal) crash
d[, Any_Crash:=ifelse(I(Fatal+Non_Fatal)>=1, 1, 0)]
d[ ,Cumsum.Any_Crash:=cumsum(Any_Crash), by=Airline]
d[Any_Crash!=1, Since.Any_Crash:=1:.N, by=list(Airline, Cumsum.Any_Crash)]
d[Any_Crash==1, Since.Any_Crash:=0]
最终data.table
看起来像这样:
# > d
# Airline Year Fatal Non_Fatal Cumsum.Fatal Since.Fatal Cumsum.Non_Fatal Since.Non_Fatal Any_Crash Cumsum.Any_Crash Since.Any_Crash
# 1: French_Airline 1989 1 1 1 0 1 0 1 1 0
# 2: French_Airline 1990 1 0 2 0 1 1 1 2 0
# 3: French_Airline 1991 0 0 2 1 1 2 0 2 1
# 4: French_Airline 1992 0 1 2 2 2 0 1 3 0
# 5: French_Airline 1993 0 0 2 3 2 1 0 3 1
# 6: UK_Airline 1989 1 1 1 0 1 0 1 1 0
# 7: UK_Airline 1990 0 0 1 1 1 1 0 1 1
# 8: UK_Airline 1991 1 0 2 0 1 2 1 2 0
# 9: UK_Airline 1992 0 0 2 1 1 3 0 2 1
# 10: UK_Airline 1993 0 0 2 2 1 4 0 2 2
<强>更新强>
以下是有关如何编写Frank礼貌的更有效方法(见评论):
d[, sf := seq(.N)-1L, by=.(Airline, cumsum(Fatal))][, snf := seq(.N)-1L, by=.(Airline, cumsum(Non_Fatal))][, sa := seq(.N)-1L, by=.(Airline, cumsum(pmax(Fatal, Non_Fatal)))][]
# Airline Year Fatal Non_Fatal Cumsum.Fatal Since.Fatal Cumsum.Non_Fatal Since.Non_Fatal Any_Crash Cumsum.Any_Crash Since.Any_Crash sf snf sa
# 1: French_Airline 1989 1 1 1 0 1 0 1 1 0 0 0 0
# 2: French_Airline 1990 1 0 2 0 1 1 1 2 0 0 1 0
# 3: French_Airline 1991 0 0 2 1 1 2 0 2 1 1 2 1
# 4: French_Airline 1992 0 1 2 2 2 0 1 3 0 2 0 0
# 5: French_Airline 1993 0 0 2 3 2 1 0 3 1 3 1 1
# 6: UK_Airline 1989 1 1 1 0 1 0 1 1 0 0 0 0
# 7: UK_Airline 1990 0 0 1 1 1 1 0 1 1 1 1 1
# 8: UK_Airline 1991 1 0 2 0 1 2 1 2 0 0 2 0
# 9: UK_Airline 1992 0 0 2 1 1 3 0 2 1 1 3 1
# 10: UK_Airline 1993 0 0 2 2 1 4 0 2 2 2 4 2