使用R数据表标记指标

时间:2018-01-12 15:10:17

标签: r

我有一个数据集,我想执行以下操作,但我无法找到最佳解决方案。

Name    Date    Paid    Outstanding Mark as Follows    Close Indicator
A   2000    100 200             Open         0
A   2001    224 100             Open         0
A   2002    348 400             Open         0
A   2003    472 0      First Time it Closes      1
A   2004    596 196             Reopens     -1
B   2004    720 200             Open         0
B   2005    844 200             Open         0
B   2006    968 0      First Time it Closes      1
B   2007    968 0               Closes       0
C   2000    1092    200             Open         0
C   2001    1216    1200                Open         0
B   2008    1340    1200               Reopens      -1
B   2010    1464    100             Open         0
B   2011    1588    0              Closes        1
A   2016    1712    0              Closes        1
D   2009    1836    60              Open         0
D   2010    1896    0              Closes        1
D   2016    1900    0              Closes        0

我希望能够复制关闭指标列。这些是交易累计金额。我的逻辑是名字,如果付款并且没有Outstanding那么我想把它标记为1,表示接近。但是,如果将来这个案例打开,那么我想在它关闭时再次标记-1和1。所以A在2003年关闭,然后在2004年重新开放,并在2016年结束。

对于D,案件在2010年结束,但是2016年的付款发生了变化,所以虽然从理论上讲这也会得到一个重新开启的标志,因为它同时再次关闭,我希望能够处理这种情况。

在R Data表中执行此操作的最佳方法是什么?

2 个答案:

答案 0 :(得分:0)

逻辑是,对于每个名称

  • 如果Outstanding在之前非零时为零,则为close_indicator 是1
  • 如果Outstanding在之前为零时非零, close_indicator是-1
  • 否则Outstanding没有改变, close_indicator为0

对于每个名称,这可以表示为(Outstanding == 0) - (lag(Outstanding) == 0)。这将获得两个被强制转换为0或1的逻辑之间的区别。

所以我们所要做的就是按名称分组,按日期排序并使用该公式。

library('tidyverse')
df <- tribble(
  ~Name, ~Date, ~Outstanding,
    "A", 2000L,         200L,
    "A", 2001L,         100L,
    "A", 2002L,         400L,
    "A", 2003L,           0L,
    "A", 2004L,         196L,
    "B", 2004L,         200L,
    "B", 2005L,         200L,
    "B", 2006L,           0L,
    "B", 2007L,           0L,
    "C", 2000L,         200L,
    "C", 2001L,        1200L,
    "B", 2008L,        1200L,
    "B", 2010L,         100L,
    "B", 2011L,           0L,
    "A", 2016L,           0L,
    "D", 2009L,          60L,
    "D", 2010L,           0L,
    "D", 2016L,           0L
)

df %>%
  rowid_to_column %>%
  group_by(Name) %>%
  arrange(Date) %>%
  mutate(close_indicator = (Outstanding == 0) - (lag(Outstanding) == 0)) %>%
  replace_na(list(close_indicator = 0)) %>%
  arrange(rowid)
# # A tibble: 18 x 5
# # Groups:   Name [4]
#    rowid  Name  Date Outstanding close_indicator
#    <int> <chr> <int>       <int>           <dbl>
#  1     1     A  2000         200               0
#  2     2     A  2001         100               0
#  3     3     A  2002         400               0
#  4     4     A  2003           0               1
#  5     5     A  2004         196              -1
#  6     6     B  2004         200               0
#  7     7     B  2005         200               0
#  8     8     B  2006           0               1
#  9     9     B  2007           0               0
# 10    10     C  2000         200               0
# 11    11     C  2001        1200               0
# 12    12     B  2008        1200              -1
# 13    13     B  2010         100               0
# 14    14     B  2011           0               1
# 15    15     A  2016           0               1
# 16    16     D  2009          60               0
# 17    17     D  2010           0               1
# 18    18     D  2016           0               0

对于data.table,可以使用

完成
dt[, close_indicator := (Outstanding == 0) - (shift(Outstanding) == 0), by = Name]
dt[is.na(close_indicator), close_indicator := 0]
#     Name Date Outstanding close_indicator
#  1:    A 2000         200               0
#  2:    A 2001         100               0
#  3:    A 2002         400               0
#  4:    A 2003           0               1
#  5:    A 2004         196              -1
#  6:    B 2004         200               0
#  7:    B 2005         200               0
#  8:    B 2006           0               1
#  9:    B 2007           0               0
# 10:    C 2000         200               0
# 11:    C 2001        1200               0
# 12:    B 2008        1200              -1
# 13:    B 2010         100               0
# 14:    B 2011           0               1
# 15:    A 2016           0               1
# 16:    D 2009          60               0
# 17:    D 2010           0               1
# 18:    D 2016           0               0

答案 1 :(得分:0)

使用data.table。我不确定我是否完全理解你的标准,但我认为下面的例子应该足以让你起步。这里应该帮助你的主要功能是shift function in data table。与分组操作(使用by = .(Name)子句)相结合,您可以为以前的余额添加一列。

创建该列后,您可以根据您的条件使用复合逻辑在相关行上添加所需的标记。

library(data.table)

DT <- data.table(Name = c("A", "A", "A", "A", "A","A", "A", "A", "B", "B","B", "B", "B"),
           Date = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2000, 2001, 2002, 2003),
           Outstanding = c(200, 100, 600 ,400, 0, 196, 200, 0, 500, 600, 0, 200, 0))

setkey(DT,Name,Date)

## Add a new column for previous outstanding balance
DT[,Prev_Outstanding := shift(Outstanding, n = 1L, fill = NA, type = "lag"), by = .(Name)]

DT[,CloseIndicator := 0] ## Pre-fill all rows with 0 initially
DT[Prev_Outstanding > 0 & Outstanding == 0, CloseIndicator := 1, by = .(Name)] ## Mark account closings
DT[Prev_Outstanding == 0 & Outstanding > 0, CloseIndicator := -1, by = .(Name)] ## Mark Account re-openings

print(DT)

收率:

    Name Date Outstanding Prev_Outstanding CloseIndicator
 1:    A 2000         200               NA              0
 2:    A 2001         100              200              0
 3:    A 2002         600              100              0
 4:    A 2003         400              600              0
 5:    A 2004           0              400              1
 6:    A 2005         196                0             -1
 7:    A 2006         200              196              0
 8:    A 2007           0              200              1
 9:    B 2000         600               NA              0
10:    B 2001           0              600              1
11:    B 2002         200                0             -1
12:    B 2003           0              200              1
13:    B 2008         500                0             -1