根据可以合并为月份范围的日期范围(可以是重复日期)来设置不同列的总和和平均值

时间:2014-07-19 19:53:11

标签: r

我试图找出如何确定一个月的收入总和,其中包含多个日期的日期范围(可以是重复日期),强制进入一个月的整个范围。我想按照相同的过程进行小时数,最后我想找到指数的平均值与整个月的范围有关。目标是输入属性名称,并作为输出接收属性的每月总收入,小时数和平均指数。在所有可用日期范围内查看每个月的所有总数是理想的。

以下是提供的数据集

Property  Date      Revenue    Hours   Index
Stanlist 7/12/12    878.67     38      7.26339  
Stanlist 7/12/12    647.56     28      7.26339  
Stanlist 7/13/12    345.24     83     7.86339   
Stanlist 7/14/12    838.48     45      8.26339  
Stanlist 7/14/12    153.56     89      8.26339  
Stanlist 7/15/12    877.34     12      9.26339  
Stanlist 7/15/12    848.57     78      9.26339  
Stanlist 8/12/13    329.24     39      6.26339  
Stanlist 8/12/13    656.77     39      6.26339  
Stanlist 8/13/13    478.45     38      9.86339  
Stanlist 12/14/13   784.56     78      8.26339  
Stanlist 12/14/13   866.76     67      8.26339  
Stanlist 12/15/13   648.46     78      7.56339  
Stanlist 3/15/14    569.34     39      8.26339  

期望的结果......

Property  Date      Revenue        Hours     Index
Stanlist 8/1/13     1003900.00     7384      6.26339    
Stanlist 9/1/13     89156.77       6374      6.26339    
Stanlist 10/1/13    73838.93       3894      9.86339    
Stanlist 11/1/13    927393.89      9732      8.26339    
Stanlist 12/1/13    67239.93       7383      8.26339     
Stanlist 3/1/14     74893.98       7484      7.56339    
Stanlist 4/1/14     89274.32       7484      8.26339    

3 个答案:

答案 0 :(得分:1)

首先,我将您的数据转换为示例

 df <- data.frame(Property = c("Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist"), Date = c("7/12/12", "7/12/12", "7/13/12", "7/14/12", "7/14/12",  "7/15/12", "7/15/12", "8/12/13", "8/12/13", "8/13/13", "12/14/13","12/14/13", "12/15/13", "3/15/14"), Revenue = c(878.67, 647.56, 345.24, 838.48, 153.56, 877.34, 848.57, 329.24, 656.77, 478.45, 784.56, 866.76, 648.46, 569.34), Hours = c(38, 28, 83, 45, 89, 12, 78, 39, 39, 38, 78, 67, 78, 39), Index = c(7.26339,7.26339, 7.86339, 8.26339, 8.26339, 9.26339, 9.26339, 6.26339, 6.26339, 9.86339, 8.26339, 8.26339, 7.56339, 8.26339))

接下来,我们创建一个日期标识符和一个辅助函数

df_month <- strftime(strptime(df$Date, "%m/%e/%y"), "%m%Y") #  "072012" "072012" "072012" ...
stat     <- function(x, FUN) tapply(x, df_month, FUN = FUN)
month    <- function(x) strftime(strptime(x[1], "%m/%e/%y"), "%m/1/%y")

我们的最终data.frame:

out <- data.frame(mapply(stat, df, list(function(x) x[1], month, sum, sum, mean)),
                  row.names = NULL)

# Property     Date Revenue Hours            Index
# Stanlist  03/1/14  569.34    39          8.26339
# Stanlist  07/1/12 4589.42   373 8.20624714285714
# Stanlist  08/1/13 1464.46   116          7.46339
# Stanlist  12/1/13 2299.78   223 8.03005666666667

答案 1 :(得分:1)

我会避免循环并使用data.table代替(假设dat是您的数据集)

dat$Date <- as.Date(paste0(format(strptime(as.character(dat$Date), "%m/%d/%y"), "%Y/%m"),"/1"))
library(data.table)
setDT(dat)[, list(Revenue = sum(Revenue),
                  Hours = sum(Hours),
                  Index = mean(Index)), by = list(Property, Date)]

#    Property       Date Revenue Hours    Index
# 1: Stanlist 2012-07-01 4589.42   373 8.206247
# 2: Stanlist 2013-08-01 1464.46   116 7.463390
# 3: Stanlist 2013-12-01 2299.78   223 8.030057
# 4: Stanlist 2014-03-01  569.34    39 8.263390

答案 2 :(得分:0)

使用dplyr。如果df(取自@Robert Krzyzanowski的例子)是数据集

library(dplyr)
df%>% 
mutate(ind=gsub("\\/.*\\/","/1/",Date))%>% #replace  values between `\..\` ie. `days` with `1`
group_by(Property,ind)%>% 
summarize(Revenue=sum(Revenue), Hours=sum(Hours), Index=mean(Index))
#Source: local data frame [4 x 5]
#Groups: Property

#    Property     ind Revenue Hours    Index
#  1 Stanlist 12/1/13 2299.78   223 8.030057
#  2 Stanlist  3/1/14  569.34    39 8.263390
#  3 Stanlist  7/1/12 4589.42   373 8.206247
#  4 Stanlist  8/1/13 1464.46   116 7.463390