回顾R中的事件

时间:2016-08-08 12:34:43

标签: r dataframe data.table

你如何计算自R发生事故以来的岁月?我的数据如下:

                 Year  Fatal  Non-Fatal
French Airline   1989    1       1 
French Airline   1990    1       0 
French Airline   1991    0       0
French Airline   1992    0       1
French Airline   1993    0       0
UK Airline       1989    1       1
UK Airline       1990    0       0
UK Airline       1991    1       0
UK Airline       1992    0       0
UK Airline       1993    0       0

由航空公司分组,我希望有一个额外的列,可以追溯到最后一次致命或非致命的崩溃发生时。输出看起来像这样:

                 Year  Fatal  Non-Fatal  Since Fatal  Since Non-Fatal               
French Airline   1989    1       1           0              0
French Airline   1990    1       0           0              1
French Airline   1991    0       0           1              2
French Airline   1992    0       1           2              0 
French Airline   1993    0       0           3              1
UK Airline       1989    1       1           0              0 
UK Airline       1990    0       0           1              1
UK Airline       1991    1       0           0              2
UK Airline       1992    0       0           1              3
UK Airline       1993    0       0           2              4

在一个理想的世界里,我最后还有另一个专栏可以追溯到任何类型的崩溃,致命或非致命。我该怎么做?

2 个答案:

答案 0 :(得分:2)

使用联接并要求包的版本1.9.7+:

library(data.table)

# data borrowed from @majom's answer

d[, sf := 
  d[as.logical(Fatal)][d, on=.(Airline, Year), roll=TRUE, Year-x.Year]]

d[, sn := 
  d[as.logical(Non_Fatal)][d, on=.(Airline, Year), roll=TRUE, Year-x.Year]]

d[, sa := 
  d[as.logical(pmax(Fatal, Non_Fatal))][d, on=.(Airline, Year), roll=TRUE, Year-x.Year]]

给出了

           Airline Year Fatal Non_Fatal sf sn sa
 1: French_Airline 1989     1         1  0  0  0
 2: French_Airline 1990     1         0  0  1  0
 3: French_Airline 1991     0         0  1  2  1
 4: French_Airline 1992     0         1  2  0  0
 5: French_Airline 1993     0         0  3  1  1
 6:     UK_Airline 1989     1         1  0  0  0
 7:     UK_Airline 1990     0         0  1  1  1
 8:     UK_Airline 1991     1         0  0  2  0
 9:     UK_Airline 1992     0         0  1  3  1
10:     UK_Airline 1993     0         0  2  4  2

或者,对于最后一个:d[, sa := pmin(sf, sn)],如果您已经计算过那些。

对逻辑的强制应该暗示那些cols应该存储为逻辑......

答案 1 :(得分:1)

以下是通过航空公司致命/非致命/任何碰撞累积金额计算变量的方法:

# Load required package
library(data.table)

# Setup data
d <- fread("Airline Year  Fatal  Non_Fatal
French_Airline   1989    1       1 
French_Airline   1990    1       0 
French_Airline   1991    0       0
French_Airline   1992    0       1
French_Airline   1993    0       0
UK_Airline       1989    1       1
UK_Airline       1990    0       0
UK_Airline       1991    1       0
UK_Airline       1992    0       0
UK_Airline       1993    0       0", header=T)

# Since fatal calculation
d[ ,Cumsum.Fatal:=cumsum(Fatal), by=Airline]
d[Fatal!=1, Since.Fatal:=1:.N, by=list(Airline, Cumsum.Fatal)]
d[Fatal==1, Since.Fatal:=0]

# Since non-fatal calculation
d[ ,Cumsum.Non_Fatal:=cumsum(Non_Fatal), by=Airline]
d[Non_Fatal!=1, Since.Non_Fatal:=1:.N, by=list(Airline, Cumsum.Non_Fatal)]
d[Non_Fatal==1, Since.Non_Fatal:=0]

# Since any (fatal or non-fatal) crash 
d[, Any_Crash:=ifelse(I(Fatal+Non_Fatal)>=1, 1, 0)]
d[ ,Cumsum.Any_Crash:=cumsum(Any_Crash), by=Airline]
d[Any_Crash!=1, Since.Any_Crash:=1:.N, by=list(Airline, Cumsum.Any_Crash)]
d[Any_Crash==1, Since.Any_Crash:=0]

最终data.table看起来像这样:

# > d
# Airline           Year Fatal Non_Fatal Cumsum.Fatal Since.Fatal Cumsum.Non_Fatal Since.Non_Fatal Any_Crash Cumsum.Any_Crash Since.Any_Crash
# 1: French_Airline 1989     1         1            1           0                1               0         1                1               0
# 2: French_Airline 1990     1         0            2           0                1               1         1                2               0
# 3: French_Airline 1991     0         0            2           1                1               2         0                2               1
# 4: French_Airline 1992     0         1            2           2                2               0         1                3               0
# 5: French_Airline 1993     0         0            2           3                2               1         0                3               1
# 6:     UK_Airline 1989     1         1            1           0                1               0         1                1               0
# 7:     UK_Airline 1990     0         0            1           1                1               1         0                1               1
# 8:     UK_Airline 1991     1         0            2           0                1               2         1                2               0
# 9:     UK_Airline 1992     0         0            2           1                1               3         0                2               1
# 10:    UK_Airline 1993     0         0            2           2                1               4         0                2               2


<强>更新

以下是有关如何编写Frank礼貌的更有效方法(见评论):

d[, sf := seq(.N)-1L, by=.(Airline, cumsum(Fatal))][, snf := seq(.N)-1L, by=.(Airline, cumsum(Non_Fatal))][, sa := seq(.N)-1L, by=.(Airline, cumsum(pmax(Fatal, Non_Fatal)))][]

# Airline           Year Fatal Non_Fatal Cumsum.Fatal Since.Fatal Cumsum.Non_Fatal Since.Non_Fatal Any_Crash Cumsum.Any_Crash Since.Any_Crash sf snf sa
# 1: French_Airline 1989     1         1            1           0                1               0         1                1               0  0   0  0
# 2: French_Airline 1990     1         0            2           0                1               1         1                2               0  0   1  0
# 3: French_Airline 1991     0         0            2           1                1               2         0                2               1  1   2  1
# 4: French_Airline 1992     0         1            2           2                2               0         1                3               0  2   0  0
# 5: French_Airline 1993     0         0            2           3                2               1         0                3               1  3   1  1
# 6:     UK_Airline 1989     1         1            1           0                1               0         1                1               0  0   0  0
# 7:     UK_Airline 1990     0         0            1           1                1               1         0                1               1  1   1  1
# 8:     UK_Airline 1991     1         0            2           0                1               2         1                2               0  0   2  0
# 9:     UK_Airline 1992     0         0            2           1                1               3         0                2               1  1   3  1
# 10:    UK_Airline 1993     0         0            2           2                1               4         0                2               2  2   4  2