如何定义指标?

时间:2019-09-30 04:09:10

标签: r dataframe

有一个名为PERNO的组,每个组如果在第i col2 ==“ a”行中,我想定义一个指标,该指标对于i + 1行到组末尾或直到它到达“ a”为1再次

      PERNO     col2      col3
        1         b         3
        1         d         3
        1         a         4
        1         d         5
        2         v         2
        2         a         3
        2         a         4
        2         x         4
        2         h         5

输出

      PERNO     col2      col3     indicator
        1         b         3          0 
        1         d         3          0
        1         a         4          0
        1         d         5          1
        2         v         2          0
        2         a         3          0
        2         a         4          0
        2         x         4          1
        2         h         5          1

在第一组中,第四行是1,因为它在col2 == a

的下一行

第二组的最后2行也是

4 个答案:

答案 0 :(得分:2)

    mSubmitButton.setOnClickListener(new View.OnClickListener() {
        @Override
        public void onClick(View v) {

            double startingBudget = Double.parseDouble(mStartingBudgetInput.getText().toString());
            mProgressBar.setMax((int)(startingBudget * 100));

            double amountSpent = Double.parseDouble(mAmountSpentInput.getText().toString());
            mProgressBar.incrementProgressBy(((int)(amountSpent * 100)) / ((int)(startingBudget * 100)));

            Log.d("subButton", "onClick: " + startingBudget * 100);
        }
    });

答案 1 :(得分:2)

我们可以将row_number()与每个组中col2 == "a"的最后一个索引进行比较

library(dplyr)
df %>%
  group_by(PERNO) %>%
  mutate(indicator = as.integer(row_number() > max(which(col2 == "a"))))

#  PERNO col2   col3 indicator
#  <int> <fct> <int>     <int>
#1     1 b         3         0
#2     1 d         3         0
#3     1 a         4         0
#4     1 d         5         1
#5     2 v         2         0
#6     2 a         3         0
#7     2 a         4         0
#8     2 x         4         1
#9     2 h         5         1

要在基数R中实现相同的逻辑,我们可以使用ave

as.integer(with(df, ave(col2 == "a", PERNO, FUN = function(x) 
                   seq_along(x) > max(which(x)))))
#[1] 0 0 0 1 0 0 0 1 1

data.table

library(data.table)
setDT(df)[, indicator := as.integer(seq_len(.N) > max(which(col2 == "a"))), by = PERNO]

答案 2 :(得分:2)

我认为应该有效的尝试:

ave(dat$col2=="a", dat$PERNO, FUN=function(x) cummax(x) & (!x) )
#[1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE

ave(dat$col2=="a", dat$PERNO, FUN=function(x) cummax(x) & (!x) ) + 0
#[1] 0 0 0 1 0 0 0 1 1

data.table 中类似:

dat[, ind := cummax(col2=="a") & col2 != "a", by=PERNO]

逻辑上基本上是指示符应保持0直到找到"a",此时cum的累积max的总数应增加到1 ,并将一直保留在1直至组结束。 "a"的任何进一步运行都应将指标重新设置为0,因此需要用col2 != "a"(!x)排除这些值。

答案 3 :(得分:1)

我们可以将whichtail一起使用

library(dplyr)
df1 %>%
    group_by(PERNO) %>%
    mutate(indicator = +(row_number() > tail(which(col2 == "a"), 1)))
# A tibble: 9 x 4
# Groups:   PERNO [2]
#  PERNO col2   col3 indicator
#  <int> <chr> <int>     <int>
#1     1 b         3         0
#2     1 d         3         0
#3     1 a         4         0
#4     1 d         5         1
#5     2 v         2         0
#6     2 a         3         0
#7     2 a         4         0
#8     2 x         4         1
#9     2 h         5         1

base R中的逻辑相同

df1$indicator <- with(df1, ave(col2 == "a", PERNO, FUN = function(x)
                seq_along(x) > tail(which(x== "a'), 1)))

或与data.table

library(data.table)
setDT(df1)[, indicator := +(seq_len(.N) > tail(which(col2 == "a"), 1)), PERNO]

数据

df1 <- structure(list(PERNO = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), 
    col2 = c("b", "d", "a", "d", "v", "a", "a", "x", "h"), col3 = c(3L, 
    3L, 4L, 5L, 2L, 3L, 4L, 4L, 5L)), class = "data.frame", row.names = c(NA, 
-9L))