创建列以标识数据框内的“会话”

时间:2014-02-18 20:20:49

标签: r dataset

我有大量的交易数据可以跟踪购买,退货以及销售点运营商何时收到付款/退款后清算交易。我希望能够根据收银员“清除”屏幕的时间对会话进行编号,并且在清除编号相同的情况下进行所有交易。

我提取了所有非必要数据,但这里是dput()的样子:

my.data.1<-structure(list(TOTSND_Clear = c("0", "0", "0", "0", "0", "0", 
"4.00", "0", "0", "10.00", "0", "0", "12.00", "0", "-5.00"), 
    TOTSND_UNBAL = c("0", "1.00", "0", "0", "0", "0", "0", "0", 
    "0", "0", "0", "0", "0", "0", "0")), .Names = c("TOTSND_Clear", 
"TOTSND_UNBAL"), row.names = c(NA, 15L), class = "data.frame")

看起来像这样:

TOTSND_Clear    TOTSND_UNBAL
    0             0
    0             1.00
    0             0
    0             0
    0             0
    0             0
    4.00          0

所有这些零都表示发生的其他形式的交易,无论是出售还是退款。当TOTSND_Clear或TOTSND_UNBAL具有值时,表示事务实例正在结束。这些数字是美元金额,而不是交易类型的数量(在这个例子中恰好看起来像这样)。

我想产生这些结果:

my.data.results<-structure(list(TOTSND_Clear = c("0", "0", "0", "0", "0", "0", 
"4.00", "0", "0", "10.00", "0", "0", "12.00", "0", "-5.00"), 
    TOTSND_UNBAL = c("0", "1.00", "0", "0", "0", "0", "0", "0", 
    "0", "0", "0", "0", "0", "0", "0"), session = c(1, 1, 2, 
    2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5)), .Names = c("TOTSND_Clear", 
"TOTSND_UNBAL", "session"), row.names = c(NA, 15L), class = "data.frame")

看起来像这样:

TOTSND_Clear    TOTSND_UNBAL    session
    0              0              1
    0              1.00           1
    0              0              2
    0              0              2
    0              0              2
    0              0              2
    4.00           0              2

我会放置代码,但我不知道从哪里开始。我已经找到了为实例编号的方法,但是没有为清除数据之前发生的字段分配相同的编号,而是在上一次清除之后。

2 个答案:

答案 0 :(得分:2)

也许是这样的......?

ind <- which(with(my.data.1,TOTSND_Clear != 0 | TOTSND_UNBAL != 0))
> rep(seq_along(ind),times = c(ind[1],diff(ind)))
 [1] 1 1 2 2 2 2 2 3 3 3 4 4 4 5 5

然后您可以将其添加为列。

答案 1 :(得分:2)

这是一种方式:

c(1, cumsum(diff(as.logical(rowSums(
  my.data.1[c("TOTSND_Clear", "TOTSND_UNBAL")] != 0))) < 0) + 1)

# [1] 1 1 2 2 2 2 2 3 3 3 4 4 4 5 5