我似乎被困在一个非常基本的问题上,我知道这很简单,但我无法弄清楚。
因此,我的数据具有HireDate和TermDate。 TermDate是任何员工的最后一天。
我想这样做:
杠杆=从TermDate获取的当月计数
特定月份的营业额=当月离职人数/ AVG(上个月和当月的行数)
复制数据
buildscript {
repositories {
maven { url "https://binrepo.mycompany.com/artifactory/platform" }
maven { url "https://binrepo.mycompany.com/artifactory/maven-central" }
jcenter()
mavenCentral()
maven {
url "https://plugins.gradle.org/m2/"
}
}
dependencies {
classpath "com.mycompany.platform:platform-connector-gradle:1.1.4"
}
}
apply plugin: 'org.springframework.boot'
apply plugin: "com.mycompany.platform.connector.spring-boot"
apply plugin: "io.spring.dependency-management"
mainClassName = "com.mycompany.learnattargetuser.Main"
version = "0.0.1"
distTar.version = ""
dependencies{
compile "org.springframework.boot:spring-boot-starter-data-jpa"
}
答案 0 :(得分:1)
library(dplyr)
df %>%
mutate(leavemonth=strftime(TermDate,format="%m-%Y")) %>%
group_by(leavemonth) %>%
summarize(n=n())
# A tibble: 51 x 2
leavemonth n
<chr> <int>
1 01-2007 1
2 01-2008 1
3 01-2009 1
4 01-2013 1
5 01-2017 1
6 02-2005 1
7 02-2007 1
8 02-2011 1
9 02-2015 2
10 03-2009 2
# ... with 41 more rows
我为每行终止日期的月份年创建一列具有唯一标识符的列,然后使用summarize
对其进行计数。
如果您只想向现有表中添加n
,我们可以将摘要替换为add_count
:
df %>%
mutate(leavemonth=strftime(TermDate,format="%m-%Y")) %>%
add_count(leavemonth)
# A tibble: 100 x 4
HireDate TermDate leavemonth n
<date> <date> <chr> <int>
1 2018-06-20 NA NA 34
2 2006-04-04 2006-10-18 10-2006 2
3 2016-04-04 2018-06-30 06-2018 2
4 2017-01-01 NA NA 34
5 2003-10-10 2005-04-07 04-2005 2
6 2008-01-01 2012-03-09 03-2012 3
7 2003-09-08 2005-04-04 04-2005 2
8 2007-08-20 2015-02-27 02-2015 2
9 2010-06-29 2016-11-30 11-2016 3
10 2015-12-16 2016-05-23 05-2016 1
# ... with 90 more rows
答案 1 :(得分:1)
有点冗长,但可以使用:
library(data.table)
df_leavers <- setDT(df)[, `:=` (TermDate = as.Date(as.character(TermDate)),
HireDate = as.Date(as.character(HireDate)))]
df_presences <- copy(df_leavers)
df_leavers <- df_leavers[, TermDate := format(TermDate, "%Y-%m")][!is.na(TermDate), (Leavers = .N), , by = TermDate]
df_presences <- df_presences[, maxTerm := max(TermDate, na.rm = T)][
is.na(TermDate), TermDate := maxTerm][
, .(YearMonth = format(seq(HireDate, TermDate, by = "month"), "%Y-%m")), by = 1:nrow(df)][
, (Presences = .N), by = YearMonth]
df_final <- df_leavers[df_presences, on = .(TermDate = YearMonth)]
setnames(df_final, c("YearMonth", "Leavers", "Presences"))
df_final <- df_final[is.na(Leavers), Leavers := 0][order(YearMonth),][, previousMonth := shift(Presences)][
is.na(previousMonth), previousMonth := 0][, AvgPresences := (Presences + previousMonth) / 2][
, Turnover := round(Leavers / AvgPresences, 2)][, "previousMonth" := NULL]
输出(数据集的开头和结尾):
YearMonth Leavers Presences AvgPresences Turnover
1: 1999-04 0 1 0.5 0.00
2: 1999-05 0 2 1.5 0.00
3: 1999-06 0 2 2.0 0.00
4: 1999-07 0 2 2.0 0.00
5: 1999-08 0 2 2.0 0.00
---
227: 2018-02 0 32 32.5 0.00
228: 2018-03 3 36 34.0 0.09
229: 2018-04 0 33 34.5 0.00
230: 2018-05 1 34 33.5 0.03
231: 2018-06 2 36 35.0 0.06