基于满足条件的行数的r data.table过滤器

时间:2019-09-16 08:14:03

标签: r filter count data.table

我正在学习data.table,并在一处感到困惑。需要帮助以了解如何实现以下目标。我所拥有的数据,我需要过滤出在第一个期间内销售额为0或在至少14个期间内销售额不大于0的品牌。我已经尝试过,并且我认为我已经达到了第一部分。。。。但是,我无法获得第二部分,以过滤那些在至少14个周期内销售额不大于0的品牌。

下面是我编写的示例数据和代码。请提出如何实现第二部分的建议?

library(data.table)
#### set the seed value
set.seed(9901)

#### create the sample variables for creating the data
group <- sample(1:7,1200,replace = T)
brn <- sample(1:10,1200,replace = T)
period <- rep(101:116,75)
sales <- sample(0:50,1200,replace = T)

#### create the data.table
df1 <- data.table(cbind(group,brn,period,sales))

#### taking the minimum value by group x brand x period
df1_min <- df1[,.(min1 = min(sales,na.rm = T)),by = c('group','brn','period')][order(group,brn,period)]

#### creating the filter
df1_min$fil1 <- ifelse(df1_min$period == 101 & df1_min$min1 == 0,1,0)

谢谢!!

1 个答案:

答案 0 :(得分:2)

假定第一个限制适用于整个数据集的最小期限(101),则意味着从0销售额开始大于101的brn /组对仍然包括在内。

# 1. brn/group pairs with sales of 0 in the 1st period.
brngroup_zerosales101 = df1[sales == 0 & period == min(period), .(brn, group)]

# 2a. Identify brn/group pairs with <14 positive sale periods
df1[, posSale := ifelse(sales > 0, 1, 0)] # Was the period sale positive?

# 2b. For each brn/group pair, sum posSale and filter posSale < 14
brngroup_sub14 = df1[, .(GroupBrnPosSales = sum(posSale)), by = .(brn, group)][GroupBrnPosSales < 14, .(brn, group)]

# 3. Join the two restrictions
restr = rbindlist(list(brngroup_zerosales101, brngroup_sub14)) 

df1[, ID := paste(brn, group)] # Create a brn-group ID
restr[, ID := paste(brn, group)] # See above

filtered = df1[!(ID %in% restr[,ID]),]