R:Group_by函数未正确汇总总和最大/最小日期的数据

时间:2019-03-28 19:04:48

标签: r

我有6个变量的数据:EmployeeID,JobID,名称,JobLocation,日期和HoursWorked。我想按EmployeeID和JobID对数据进行分组(即在一行中找到具有相同EmployeeID和JobID的所有记录),然后按组查找最小和最大日期,以及这些日期之间所有小时数的总和。我希望数据以列结尾:EmployeeID,JobID,名称JobLocation,MinDate,MaxDate,TotalHoursWorked。

到目前为止,我已经尝试过了,但是MinDate,MaxDate和TotalHoursWorked的每条记录都显示相同的日期。

Data$EmployeeID<- as.factor(Data$EmployeeID) 
Data$JobID<- as.factor(Data$JobID) 
Data$Date<- as.factor(Data$Date)
Data$Date<- as.Date(Data$Date,format="%m/%d/%Y")
Data$HoursWorked<-as.numeric(Data$HoursWorked)

Data<-Data[c("EmployeeID", "Name","JobID", "JobLocation", "Date", "HoursWorked")]
Data<- Data%>% 
  group_by(Data$EmployeeID,Data$JobID, Data$Name,Data$JobLocation) %>%
  summarize(TotalHoursWorked = sum(Data$HoursWorked)) %>%
  mutate(MaxDate=max(Data$Date), MinDate=min(Data$Date))

不带“名称”列的样本(数据)输出:

> sample(Data)
# A tibble: 1,000 x 5
   EmployeeID HoursWorked JobID           Date       JobLocation
   <fct>            <dbl> <fct>           <date>     <chr>         
 1 32589              4   B3031-002513-00 2016-03-14 #             
 2 32590              8   B3031-002562-00 2016-04-08 #             
 3 32591              9   B3031-002564-00 2016-04-05 #             
 4 32591              2.5 B3031-002564-00 2016-04-06 #             
 5 32591              3   B3031-002562-00 2016-04-07 #             
 6 32591              7.5 B3031-002562-00 2016-04-08 #             
 7 32605              0   B3031-002348-00 2016-01-04 #             
 8 32605              3   B3031-002419-00 2016-01-04 #             
 9 32605              0   B3031-002348-00 2016-01-05 #             
10 32605              3   B3031-002419-00 2016-01-05 #             
# ... with 990 more rows

运行group_by代码后输出:

> sample(Data)
# A tibble: 80 x 6
   MaxDate    `Data$JobID`    MinDate    `Data$\`Job Location\`` TotalHoursWorked `Data$EmployeeID`
   <date>     <fct>           <date>     <chr>                              <dbl> <fct>            
 1 2016-07-29 B3031-002513-00 2016-01-04 #                                  3288. 32589            
 2 2016-07-29 B3031-002562-00 2016-01-04 #                                  3288. 32590            
 3 2016-07-29 B3031-002562-00 2016-01-04 #                                  3288. 32591            
 4 2016-07-29 B3031-002564-00 2016-01-04 #                                  3288. 32591            
 5 2016-07-29 B3031-002348-00 2016-01-04 #                                  3288. 32605            
 6 2016-07-29 B3031-002419-00 2016-01-04 #                                  3288. 32605            
 7 2016-07-29 B3031-002445-00 2016-01-04 #                                  3288. 32605            
 8 2016-07-29 B3031-002502-00 2016-01-04 #                                  3288. 32605            
 9 2016-07-29 B3031-002504-00 2016-01-04 #                                  3288. 32605            
10 2016-07-29 B3031-002505-00 2016-01-04 #                                  3288. 32605            
# ... with 70 more rows

1 个答案:

答案 0 :(得分:0)

实际上非常简单,您应该只使用summarise时使用mutatesummarise

可能不需要该第一条指令,在读取以下数据时,我会运行它来强制Date列。

Data$Date <- as.Date(Data$Date)

现在解决。

library(tidyverse)

Data %>%
  group_by(EmployeeID, JobID) %>%
  summarise(TotalHoursWorked = sum(HoursWorked),
            MaxDate = max(Date), MinDate = min(Date))

数据。

Data <- read.table(text = "
EmployeeID HoursWorked JobID           Date       JobLocation
  1 32589              4   B3031-002513-00 2016-03-14 #             
2 32590              8   B3031-002562-00 2016-04-08 #             
3 32591              9   B3031-002564-00 2016-04-05 #             
4 32591              2.5 B3031-002564-00 2016-04-06 #             
5 32591              3   B3031-002562-00 2016-04-07 #             
6 32591              7.5 B3031-002562-00 2016-04-08 #             
7 32605              0   B3031-002348-00 2016-01-04 #             
8 32605              3   B3031-002419-00 2016-01-04 #             
9 32605              0   B3031-002348-00 2016-01-05 #             
10 32605              3   B3031-002419-00 2016-01-05 #   
", header = TRUE, comment.char = "")