按月汇率转换多年和月月付款数据R

时间:2017-06-08 17:04:53

标签: r

在R中:我加入了两个文件。两者都包含佣金支付数据,我有两个文件,因为工作代码的薪酬结构期间不同。例如,文件一中的所有工作代码都是每月支付佣金,而文件二中的所有工作代码都是双月支付的佣金。   为了准确和公平地分析数据我需要按月为每个员工ID(目前是一个因素)汇总(总和)支付到一个新领域(让我们称之为“每月支付”),我的问题是我似乎成功地总结每个员工的月工资,但目前它忽略了不同的年份。我并不反对从2015年6月6日开始的年度和月份或虚拟编码作为支付月1-24,但我想知道是否有办法一次性完成这一切?

电流:

Check_DT   EMPLID   DEPTID JOBCODE PAY_FREQUENCY MAX._TTL.GROSS
2015-12-18 99999999 23231606  100880             W           1203
2015-12-24 99999999 23231606  100880             W            597
2015-12-31 99999999 23231606  100880             W            625
2016-01-08 99999999 23231606  100880             W            245
2016-01-13 99999999 23231606  100880             W            480
2016-01-15 99999999 23231606  100880             W            758
2016-01-22 99999999 23231606  100880             W            599
2016-01-29 99999999 23231606  100880             W            551
2016-02-05 99999999 23231606  100880             W            767
2016-02-12 99999999 23231606  100880             W            880
2016-02-19 99999999 23231606  100880             W            557
2016-02-26 99999999 20441606  100880             W            909
2016-03-04 99999999 20441606  100880             W            989
2016-03-11 99999999 20441606  100880             W            751
2016-03-18 99999999 20441606  100880             W            776
2016-03-25 99999999 20441606  100880             W            770
2016-04-01 99999999 20441606  100880             W            712
2016-04-08 99999999 20441606  100880             W            602
2016-04-15 99999999 20441606  100880             W            798
2016-04-22 99999999 20441606  100880             W            527

我想要什么(实际需要,我将进行聚类分析):

>Check_DT   EMPLID   DEPTID JOBCODE PAY_FREQUENCY MAX._TTL.GROSS Year Month Pay
>2015-12-18 99999999 23231606  100880             W           1203 2015 12 2425
>2015-12-24 99999999 23231606  100880             W            597 
>2015-12-31 99999999 23231606  100880             W            625
>2016-01-08 99999999 23231606  100880             W            245 2016 01 2633
>2016-01-13 99999999 23231606  100880             W            480
>2016-01-15 99999999 23231606  100880             W            758
>2016-01-22 99999999 23231606  100880             W            599
>2016-01-29 99999999 23231606  100880             W            551
>2016-02-05 99999999 23231606  100880             W            767
>2016-02-12 99999999 23231606  100880             W            880
>2016-02-19 99999999 23231606  100880             W            557
>2016-02-26 99999999 20441606  100880             W            909
>2016-03-04 99999999 20441606  100880             W            989
>2016-03-11 99999999 20441606  100880             W            751
>2016-03-18 99999999 20441606  100880             W            776
>2016-03-25 99999999 20441606  100880             W            770
>2016-04-01 99999999 20441606  100880             W            712
>2016-04-08 99999999 20441606  100880             W            602
>2016-04-15 99999999 20441606  100880             W            798
>2016-04-22 99999999 20441606  100880             W            527
等等......我甚至不反对每年和每月组合的年份和日期重复,我可以摆脱重复。提醒一下,文件中的某些人每周都会收到付款,其他人则每两个月付一次。

以下是我所做的:

#Convert weekly/bimonthly pay to monthly sum of pay
  paydat_all$monthlypay <- month(paydat_all$Check_DT)
  aggregate(MAX._TTL.GROSS~monthlypay+EMPLID, FUN = sum, data = paydat_all)  

2 个答案:

答案 0 :(得分:1)

这可以为您提供您正在寻找的结果

library(lubridate)
library(dplyr)

 data = 'Check_DT   EMPLID   DEPTID JOBCODE PAY_FREQUENCY MAX._TTL.GROSS
"2015-12-18" 99999999 23231606  100880             W           1203
"2015-12-24" 99999999 23231606  100880             W            597
"2015-12-31" 99999999 23231606  100880             W            625
"2016-01-08" 99999999 23231606  100880             W            245
"2016-01-13" 99999999 23231606  100880             W            480
"2016-01-15" 99999999 23231606  100880             W            758
"2016-01-22" 99999999 23231606  100880             W            599
"2016-01-29" 99999999 23231606  100880             W            551
"2016-02-05" 99999999 23231606  100880             W            767
"2016-02-12" 99999999 23231606  100880             W            880
"2016-02-19" 99999999 23231606  100880             W            557
"2016-02-26" 99999999 20441606  100880             W            909
"2016-03-04" 99999999 20441606  100880             W            989
"2016-03-11" 99999999 20441606  100880             W            751
"2016-03-18" 99999999 20441606  100880             W            776
"2016-03-25" 99999999 20441606  100880             W            770
"2016-04-01" 99999999 20441606  100880             W            712
"2016-04-08" 99999999 20441606  100880             W            602
"2016-04-15" 99999999 20441606  100880             W            798
"2016-04-22" 99999999 20441606  100880             W            527'

paydat_all <- read.table(text=data, header=TRUE, 
                         colClasses=c("Date", "character", "character", 
                                      "character", "factor", "integer"))

paydat_all <- paydat_all %>%
              mutate(Year = year(Check_DT),
                     Month = month(Check_DT)) %>%
              group_by(EMPLID, DEPTID, JOBCODE, Year, Month) %>%
              summarise(sum(MAX._TTL.GROSS))

答案 1 :(得分:1)

考虑基线R的ave用于内联聚合,其中:

  • first arg是要聚合的列
  • 之后的一个或多个逗号分隔的args是分组的因子级别
  • 使用聚合类型的显式命名FUN参数。

R脚本

data = 'Check_DT   EMPLID   DEPTID JOBCODE PAY_FREQUENCY MAX._TTL.GROSS
"2015-12-18" 99999999 23231606  100880             W           1203
"2015-12-24" 99999999 23231606  100880             W            597
"2015-12-31" 99999999 23231606  100880             W            625
"2016-01-08" 99999999 23231606  100880             W            245
"2016-01-13" 99999999 23231606  100880             W            480
"2016-01-15" 99999999 23231606  100880             W            758
"2016-01-22" 99999999 23231606  100880             W            599
"2016-01-29" 99999999 23231606  100880             W            551
"2016-02-05" 99999999 23231606  100880             W            767
"2016-02-12" 99999999 23231606  100880             W            880
"2016-02-19" 99999999 23231606  100880             W            557
"2016-02-26" 99999999 20441606  100880             W            909
"2016-03-04" 99999999 20441606  100880             W            989
"2016-03-11" 99999999 20441606  100880             W            751
"2016-03-18" 99999999 20441606  100880             W            776
"2016-03-25" 99999999 20441606  100880             W            770
"2016-04-01" 99999999 20441606  100880             W            712
"2016-04-08" 99999999 20441606  100880             W            602
"2016-04-15" 99999999 20441606  100880             W            798
"2016-04-22" 99999999 20441606  100880             W            527'

paydat_all <- read.table(text=data, header=TRUE, 
                         colClasses=c("Date", "character", "character", 
                                      "character", "factor", "integer"))
# MONTH AND YEAR
paydat_all[c("Month", "Year")] <- sapply(c("%m", "%y"), 
                                         function(d) format(paydat_all$Check_DT, d))

# THREE GROUP BY VARS WITH FORMAT() TO EXTRACT DATE TYPES
paydat_all$PaySum <- ave(paydat_all$`MAX._TTL.GROSS`, paydat_all$Month, 
                         paydat_all$Year, paydat_all$EMPLID, FUN=sum)
head(paydat_all)
#     Check_DT   EMPLID   DEPTID JOBCODE PAY_FREQUENCY MAX._TTL.GROSS Month Year PaySum
# 1 2015-12-18 99999999 23231606  100880             W           1203   12    15   2425
# 2 2015-12-24 99999999 23231606  100880             W            597   12    15   2425
# 3 2015-12-31 99999999 23231606  100880             W            625   12    15   2425
# 4 2016-01-08 99999999 23231606  100880             W            245   01    16   2633
# 5 2016-01-13 99999999 23231606  100880             W            480   01    16   2633
# 6 2016-01-15 99999999 23231606  100880             W            758   01    16   2633