Question

我正在应对以下挑战，非常感谢您的帮助。

考虑一下，我有一个包含以下信息的数据表。

Store   Day           In stock ?  Out of stock ?
Store A 01 - 01 - 19  1           0
Store A 02 - 01 - 19  0           1
Store A 03 - 01 - 19  0           1
Store A 04 - 01 - 19  1           0
Store A 05 - 01 - 19  1           0
Store A 06 - 01 - 19  0           1
Store A 07 - 01 - 19  0           1
Store A       …       0           1
Store B 01 - 01 - 19  1           0
Store B 02 - 01 - 19  0           1
Store B       …       0           1

对于每个商店，我想计算连续缺货或缺货的天数。这两列是二进制的，并且是互斥的。因此，对于商店A，结果将是：

Store     Duration in stock   Duration out of stock  
Store A   1
Store A                       2
Store A   2
Store A                       3

我需要对大型数据集（数百家商店的每小时值）执行此操作，因此想实现此自动化。此外，我想对“库存持续时间”和“缺货持续时间”进行进一步分析，例如平均值，极值，百分位数等。因此，需要以可能的方式组织数据。

我还没有找到解决此问题的方法。任何见识都很好！

以下用逗号分隔的值：

Store;Day;In stock?;Out of stock?
Store A;01-01-19;1;0
Store A;02-01-19;0;1
Store A;03-01-19;0;1
Store A;04-01-19;1;0
Store A;05-01-19;1;0
Store A;06-01-19;0;1
Store A;07-01-19;0;1
Store A;…;0;1
Store B;01-01-19;1;0
Store B;02-01-19;0;1
Store B;…;0;1

Answer 1

这里是dplyr的一种方法。首先，假设日期按年-月-年（因此dmy）顺序，我使用lubridate将date列转换为日期。

然后，对于每家商店，计算我们所处的“库存期”，每次在缺货和缺货之间切换时，这是一个新的。

使用那个和商店，我将每一列中的数字相加。

library(dplyr)
df %>%
  mutate(Day = lubridate::dmy(Day)) %>%
  group_by(Store) %>%
  mutate(stock_period = cumsum(In_stock != lag(In_stock, default = ""))) %>%
  group_by(Store, stock_period) %>%
  summarise(start = min(Day),
            end   = max(Day),
            In_stock = sum(In_stock), 
            Out_of_stock = sum(Out_of_stock))

# A tibble: 6 x 6
# Groups:   Store [2]
  Store   stock_period start      end        In_stock Out_of_stock
  <chr>          <int> <date>     <date>        <int>        <int>
1 Store A            1 2019-01-01 2019-01-01        1            0
2 Store A            2 2019-01-02 2019-01-03        0            2
3 Store A            3 2019-01-04 2019-01-05        2            0
4 Store A            4 2019-01-06 2019-01-07        0            2
5 Store B            1 2019-01-01 2019-01-01        1            0
6 Store B            2 2019-01-02 2019-01-02        0            1

使用此源数据：

df <- read.table(header = T, stringsAsFactors = F,
  text = "Store Day In_stock Out_of_stock
'Store A' 01-01-19 1 0
'Store A' 02-01-19 0 1
'Store A' 03-01-19 0 1
'Store A' 04-01-19 1 0
'Store A' 05-01-19 1 0
'Store A' 06-01-19 0 1
'Store A' 07-01-19 0 1
'Store B' 01-01-19 1 0
'Store B' 02-01-19 0 1")

Answer 2

希望这已经足够接近了。

但是首先要澄清一下。
当我们说“向我们显示您的数据”时，我们并不需要您正在使用的传真，而是功能上与您的数据相同或相似的东西。通常，这意味着您仅限制了包含的行数，而其他时候，由于出于隐私方面的考虑，它涉及更改列名或值的名称，但是就处理而言，数据保持不变。

首先，我尝试重现类似于您的数据的内容

set.seed(4)

Day <- as.Date(0:8, origin="2019-01-01")
Store <- rep(paste("Store", LETTERS[1:3]), each=length(Day))
In <- sample(c(0, 0, 1), length(Store), rep=TRUE)
Out <- abs(In - 1)
Day <- format(rep(Day, length=length(Store)), "%d - %m - %y")

dtf <- data.frame(Store, Day, In, Out)
head(dtf)
#     Store          Day In Out
# 1 Store A 01 - 01 - 19  0   1
# 2 Store A 02 - 01 - 19  0   1
# 3 Store A 03 - 01 - 19  0   1
# 4 Store A 04 - 01 - 19  0   1
# 5 Store A 05 - 01 - 19  1   0

鉴于此数据，以下应会产生所需的结果。

io <- with(dtf, tapply(In, Store, function(x) as.data.frame(rle(x)[1:2])))
io <- do.call(rbind, io)

iod <- with(io, 
  data.frame(Store=gsub("\\.[0-9]?", "", rownames(io)),
             Duration.in.stock=lengths*values,
             Duration.out.of.stock=lengths*!values
  )
)

iod[iod == 0] <- ""

iod
#      Store Duration.in.stock Duration.out.of.stock
# 1  Store A                                       4
# 2  Store A                 1                      
# 3  Store A                                       1
# 4  Store A                 3                      
# 5  Store B                                       1
# 6  Store B                 1                      
# 7  Store B                                       2
# 8  Store B                 1                      
# 9  Store B                                       2
# 10 Store B                 1                      
# 11 Store B                                       1
# 12 Store C                 4                      
# 13 Store C                                       3
# 14 Store C                 1

如何计算发生之间的时间（以及发生的持续时间）

2 个答案: