连续找到连续的零

时间:2017-12-15 07:20:27

标签: r dataframe

我有一个销售数月的数据集,我需要找到停止购买的客户。

Clients     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Client 1    123 768 678 452 213 123 55  10  0   0   0   0
Client 2    549 542 21  321 31  59  998 0   546 980 0   987
Client 3    500 0   500 0   500 0   500 0   500 0   500 0
Client 4    126 545 2315    268 126 56  0   0   0   0   0   
Client 5    546 546 0   0   0   328 486 326 0   0   66  0
Client 6    0   0   0   25  78  563 698 631 230 53  0   0

所以,我假设客户端1和客户端4停止了与我们合作,我怎么能找到它们?或者我怎样才能找到超过3个连续零的行?

3 个答案:

答案 0 :(得分:1)

#Had to fix Client 4, one number was missing
DF <- read.table(text = 'Clients     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
                 "Client 1"    123 768 678 452 213 123 55  10  0   0   0   0
                 "Client 2"    549 542 21  321 31  59  998 0   546 980 0   987
                 "Client 3"    500 0   500 0   500 0   500 0   500 0   500 0
                 "Client 4"    126 545 2315 27  268 126 56  0   0   0   0   0   
                 "Client 5"    546 546 0   0   0   328 486 326 0   0   66  0
                 "Client 6"    0   0   0   25  78  563 698 631 230 53  0   0', header = TRUE)

循环遍历行,反转顺序,找出哪个条目是第一个非零;如果客户从未处理过交易length(x)

n <- apply(DF[, -1], 1, function(x) if (any(x)) which.max(rev(x) != 0) - 1 else length(x))
#[1] 4 0 1 5 1 2

DF$Clients[n >= 3]
#[1] Client 1 Client 4
#Levels: Client 1 Client 2 Client 3 Client 4 Client 5 Client 6

答案 1 :(得分:1)

通过基础R的另一个想法可以是,

k <- 3
df$Clients[rowSums(df[-c(1:(ncol(df) - k))] == 0) == k]
#[1] Client1 Client4
#Levels: Client1 Client2 Client3 Client4 Client5 Client6

此外,我们可以转换为long,获取最后3个值,并且filter所有这些值为0.然后pull Clients。通过dplyr

完成
library(dplyr)

k <- 3
v1 <- df %>% 
       gather(var, val, -Clients) %>% 
       group_by(Clients) %>% 
       slice((n()-k):n()) %>% 
       filter(all(val == 0)) %>% 
       pull(Clients)

unique(v1)
#[1] Client1 Client4
#Levels: Client1 Client2 Client3 Client4 Client5 Client6

答案 2 :(得分:0)

data <- data.frame(Clients = c("Client 1",  "Client 2", "Client 3", "Client 4", "Client 5", "Client 6"),
               Jan = c(123,549,500,126,546,0), 
               Feb = c(768,542,0,545,546,0), 
               Mar = c(678,21,500,2315,0,0),
               Apr= c(452,321,0,0,0,25),
               May= c(213,31,500,268,0,78),
               Jun= c(123,59,0,126,328,563),
               Jul= c(55,998,500,56,486,698),
               Aug= c(10,0,0,0,326,631),
               Sep= c(0,546,500,0,0,230),
               Oct= c(0,980,0,0,0,53),
               Nov= c(0,0,500,0,66,0),
               Dec= c(0,987,0,0,0,0))

data_Clean <- data %>%
  mutate(Client_Stat = rowSums(data[,(ncol(data)-2):ncol(data)]))%>%
  mutate(Client_Status = ifelse(Client_Stat < 1,"Left","with us"))

在这种情况下,您将只获得过去3个月内没有交易的客户。

描述:我们总结了最后3列并检查了如果总和值大于0而不是他在我们身边,或者客户离开......

希望这有用。