我试图找出如何确定一个月的收入总和,其中包含多个日期的日期范围(可以是重复日期),强制进入一个月的整个范围。我想按照相同的过程进行小时数,最后我想找到指数的平均值与整个月的范围有关。目标是输入属性名称,并作为输出接收属性的每月总收入,小时数和平均指数。在所有可用日期范围内查看每个月的所有总数是理想的。
以下是提供的数据集
Property Date Revenue Hours Index
Stanlist 7/12/12 878.67 38 7.26339
Stanlist 7/12/12 647.56 28 7.26339
Stanlist 7/13/12 345.24 83 7.86339
Stanlist 7/14/12 838.48 45 8.26339
Stanlist 7/14/12 153.56 89 8.26339
Stanlist 7/15/12 877.34 12 9.26339
Stanlist 7/15/12 848.57 78 9.26339
Stanlist 8/12/13 329.24 39 6.26339
Stanlist 8/12/13 656.77 39 6.26339
Stanlist 8/13/13 478.45 38 9.86339
Stanlist 12/14/13 784.56 78 8.26339
Stanlist 12/14/13 866.76 67 8.26339
Stanlist 12/15/13 648.46 78 7.56339
Stanlist 3/15/14 569.34 39 8.26339
期望的结果......
Property Date Revenue Hours Index
Stanlist 8/1/13 1003900.00 7384 6.26339
Stanlist 9/1/13 89156.77 6374 6.26339
Stanlist 10/1/13 73838.93 3894 9.86339
Stanlist 11/1/13 927393.89 9732 8.26339
Stanlist 12/1/13 67239.93 7383 8.26339
Stanlist 3/1/14 74893.98 7484 7.56339
Stanlist 4/1/14 89274.32 7484 8.26339
答案 0 :(得分:1)
首先,我将您的数据转换为示例
df <- data.frame(Property = c("Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist", "Stanlist"), Date = c("7/12/12", "7/12/12", "7/13/12", "7/14/12", "7/14/12", "7/15/12", "7/15/12", "8/12/13", "8/12/13", "8/13/13", "12/14/13","12/14/13", "12/15/13", "3/15/14"), Revenue = c(878.67, 647.56, 345.24, 838.48, 153.56, 877.34, 848.57, 329.24, 656.77, 478.45, 784.56, 866.76, 648.46, 569.34), Hours = c(38, 28, 83, 45, 89, 12, 78, 39, 39, 38, 78, 67, 78, 39), Index = c(7.26339,7.26339, 7.86339, 8.26339, 8.26339, 9.26339, 9.26339, 6.26339, 6.26339, 9.86339, 8.26339, 8.26339, 7.56339, 8.26339))
接下来,我们创建一个日期标识符和一个辅助函数
df_month <- strftime(strptime(df$Date, "%m/%e/%y"), "%m%Y") # "072012" "072012" "072012" ...
stat <- function(x, FUN) tapply(x, df_month, FUN = FUN)
month <- function(x) strftime(strptime(x[1], "%m/%e/%y"), "%m/1/%y")
我们的最终data.frame:
out <- data.frame(mapply(stat, df, list(function(x) x[1], month, sum, sum, mean)),
row.names = NULL)
# Property Date Revenue Hours Index
# Stanlist 03/1/14 569.34 39 8.26339
# Stanlist 07/1/12 4589.42 373 8.20624714285714
# Stanlist 08/1/13 1464.46 116 7.46339
# Stanlist 12/1/13 2299.78 223 8.03005666666667
答案 1 :(得分:1)
我会避免循环并使用data.table
代替(假设dat
是您的数据集)
dat$Date <- as.Date(paste0(format(strptime(as.character(dat$Date), "%m/%d/%y"), "%Y/%m"),"/1"))
library(data.table)
setDT(dat)[, list(Revenue = sum(Revenue),
Hours = sum(Hours),
Index = mean(Index)), by = list(Property, Date)]
# Property Date Revenue Hours Index
# 1: Stanlist 2012-07-01 4589.42 373 8.206247
# 2: Stanlist 2013-08-01 1464.46 116 7.463390
# 3: Stanlist 2013-12-01 2299.78 223 8.030057
# 4: Stanlist 2014-03-01 569.34 39 8.263390
答案 2 :(得分:0)
使用dplyr
。如果df
(取自@Robert Krzyzanowski的例子)是数据集
library(dplyr)
df%>%
mutate(ind=gsub("\\/.*\\/","/1/",Date))%>% #replace values between `\..\` ie. `days` with `1`
group_by(Property,ind)%>%
summarize(Revenue=sum(Revenue), Hours=sum(Hours), Index=mean(Index))
#Source: local data frame [4 x 5]
#Groups: Property
# Property ind Revenue Hours Index
# 1 Stanlist 12/1/13 2299.78 223 8.030057
# 2 Stanlist 3/1/14 569.34 39 8.263390
# 3 Stanlist 7/1/12 4589.42 373 8.206247
# 4 Stanlist 8/1/13 1464.46 116 7.463390