在月营业额上寻找月份

时间:2018-12-17 13:35:26

标签: r dplyr

我似乎被困在一个非常基本的问题上,我知道这很简单,但我无法弄清楚。

因此,我的数据具有HireDate和TermDate。 TermDate是任何员工的最后一天。

我想这样做:

杠杆=从TermDate获取的当月计数

特定月份的营业额=当月离职人数/ AVG(上个月和当月的行数)

复制数据

buildscript { repositories { maven { url "https://binrepo.mycompany.com/artifactory/platform" } maven { url "https://binrepo.mycompany.com/artifactory/maven-central" } jcenter() mavenCentral() maven { url "https://plugins.gradle.org/m2/" } } dependencies { classpath "com.mycompany.platform:platform-connector-gradle:1.1.4" } } apply plugin: 'org.springframework.boot' apply plugin: "com.mycompany.platform.connector.spring-boot" apply plugin: "io.spring.dependency-management" mainClassName = "com.mycompany.learnattargetuser.Main" version = "0.0.1" distTar.version = "" dependencies{ compile "org.springframework.boot:spring-boot-starter-data-jpa" }

2 个答案:

答案 0 :(得分:1)

library(dplyr)
df %>% 
  mutate(leavemonth=strftime(TermDate,format="%m-%Y")) %>% 
  group_by(leavemonth) %>% 
  summarize(n=n())

# A tibble: 51 x 2
   leavemonth     n
   <chr>      <int>
 1 01-2007        1
 2 01-2008        1
 3 01-2009        1
 4 01-2013        1
 5 01-2017        1
 6 02-2005        1
 7 02-2007        1
 8 02-2011        1
 9 02-2015        2
10 03-2009        2
# ... with 41 more rows

我为每行终止日期的月份年创建一列具有唯一标识符的列,然后使用summarize对其进行计数。

如果您只想向现有表中添加n,我们可以将摘要替换为add_count

df %>% 
  mutate(leavemonth=strftime(TermDate,format="%m-%Y")) %>% 
  add_count(leavemonth)

# A tibble: 100 x 4
   HireDate   TermDate   leavemonth     n
   <date>     <date>     <chr>      <int>
 1 2018-06-20 NA         NA            34
 2 2006-04-04 2006-10-18 10-2006        2
 3 2016-04-04 2018-06-30 06-2018        2
 4 2017-01-01 NA         NA            34
 5 2003-10-10 2005-04-07 04-2005        2
 6 2008-01-01 2012-03-09 03-2012        3
 7 2003-09-08 2005-04-04 04-2005        2
 8 2007-08-20 2015-02-27 02-2015        2
 9 2010-06-29 2016-11-30 11-2016        3
10 2015-12-16 2016-05-23 05-2016        1
# ... with 90 more rows

答案 1 :(得分:1)

有点冗长,但可以使用:

library(data.table)

df_leavers <- setDT(df)[, `:=` (TermDate = as.Date(as.character(TermDate)),
                                HireDate = as.Date(as.character(HireDate)))]

df_presences <- copy(df_leavers)

df_leavers <- df_leavers[, TermDate := format(TermDate, "%Y-%m")][!is.na(TermDate), (Leavers = .N), , by = TermDate]

df_presences <- df_presences[, maxTerm := max(TermDate, na.rm = T)][
  is.na(TermDate), TermDate := maxTerm][
    , .(YearMonth = format(seq(HireDate, TermDate, by = "month"), "%Y-%m")), by = 1:nrow(df)][
      , (Presences = .N), by = YearMonth]

df_final <- df_leavers[df_presences, on = .(TermDate = YearMonth)]

setnames(df_final, c("YearMonth", "Leavers", "Presences"))

df_final <- df_final[is.na(Leavers), Leavers := 0][order(YearMonth),][, previousMonth := shift(Presences)][
  is.na(previousMonth), previousMonth := 0][, AvgPresences := (Presences + previousMonth) / 2][
    , Turnover := round(Leavers / AvgPresences, 2)][, "previousMonth" := NULL]

输出(数据集的开头和结尾):

     YearMonth Leavers Presences AvgPresences Turnover
  1:   1999-04       0         1          0.5     0.00
  2:   1999-05       0         2          1.5     0.00
  3:   1999-06       0         2          2.0     0.00
  4:   1999-07       0         2          2.0     0.00
  5:   1999-08       0         2          2.0     0.00
 ---                                                  
227:   2018-02       0        32         32.5     0.00
228:   2018-03       3        36         34.0     0.09
229:   2018-04       0        33         34.5     0.00
230:   2018-05       1        34         33.5     0.03
231:   2018-06       2        36         35.0     0.06