我有6个变量的数据:EmployeeID,JobID,名称,JobLocation,日期和HoursWorked。我想按EmployeeID和JobID对数据进行分组(即在一行中找到具有相同EmployeeID和JobID的所有记录),然后按组查找最小和最大日期,以及这些日期之间所有小时数的总和。我希望数据以列结尾:EmployeeID,JobID,名称JobLocation,MinDate,MaxDate,TotalHoursWorked。
到目前为止,我已经尝试过了,但是MinDate,MaxDate和TotalHoursWorked的每条记录都显示相同的日期。
Data$EmployeeID<- as.factor(Data$EmployeeID)
Data$JobID<- as.factor(Data$JobID)
Data$Date<- as.factor(Data$Date)
Data$Date<- as.Date(Data$Date,format="%m/%d/%Y")
Data$HoursWorked<-as.numeric(Data$HoursWorked)
Data<-Data[c("EmployeeID", "Name","JobID", "JobLocation", "Date", "HoursWorked")]
Data<- Data%>%
group_by(Data$EmployeeID,Data$JobID, Data$Name,Data$JobLocation) %>%
summarize(TotalHoursWorked = sum(Data$HoursWorked)) %>%
mutate(MaxDate=max(Data$Date), MinDate=min(Data$Date))
不带“名称”列的样本(数据)输出:
> sample(Data)
# A tibble: 1,000 x 5
EmployeeID HoursWorked JobID Date JobLocation
<fct> <dbl> <fct> <date> <chr>
1 32589 4 B3031-002513-00 2016-03-14 #
2 32590 8 B3031-002562-00 2016-04-08 #
3 32591 9 B3031-002564-00 2016-04-05 #
4 32591 2.5 B3031-002564-00 2016-04-06 #
5 32591 3 B3031-002562-00 2016-04-07 #
6 32591 7.5 B3031-002562-00 2016-04-08 #
7 32605 0 B3031-002348-00 2016-01-04 #
8 32605 3 B3031-002419-00 2016-01-04 #
9 32605 0 B3031-002348-00 2016-01-05 #
10 32605 3 B3031-002419-00 2016-01-05 #
# ... with 990 more rows
运行group_by代码后输出:
> sample(Data)
# A tibble: 80 x 6
MaxDate `Data$JobID` MinDate `Data$\`Job Location\`` TotalHoursWorked `Data$EmployeeID`
<date> <fct> <date> <chr> <dbl> <fct>
1 2016-07-29 B3031-002513-00 2016-01-04 # 3288. 32589
2 2016-07-29 B3031-002562-00 2016-01-04 # 3288. 32590
3 2016-07-29 B3031-002562-00 2016-01-04 # 3288. 32591
4 2016-07-29 B3031-002564-00 2016-01-04 # 3288. 32591
5 2016-07-29 B3031-002348-00 2016-01-04 # 3288. 32605
6 2016-07-29 B3031-002419-00 2016-01-04 # 3288. 32605
7 2016-07-29 B3031-002445-00 2016-01-04 # 3288. 32605
8 2016-07-29 B3031-002502-00 2016-01-04 # 3288. 32605
9 2016-07-29 B3031-002504-00 2016-01-04 # 3288. 32605
10 2016-07-29 B3031-002505-00 2016-01-04 # 3288. 32605
# ... with 70 more rows
答案 0 :(得分:0)
实际上非常简单,您应该只使用summarise
时使用mutate
和summarise
。
可能不需要该第一条指令,在读取以下数据时,我会运行它来强制Date
列。
Data$Date <- as.Date(Data$Date)
现在解决。
library(tidyverse)
Data %>%
group_by(EmployeeID, JobID) %>%
summarise(TotalHoursWorked = sum(HoursWorked),
MaxDate = max(Date), MinDate = min(Date))
数据。
Data <- read.table(text = "
EmployeeID HoursWorked JobID Date JobLocation
1 32589 4 B3031-002513-00 2016-03-14 #
2 32590 8 B3031-002562-00 2016-04-08 #
3 32591 9 B3031-002564-00 2016-04-05 #
4 32591 2.5 B3031-002564-00 2016-04-06 #
5 32591 3 B3031-002562-00 2016-04-07 #
6 32591 7.5 B3031-002562-00 2016-04-08 #
7 32605 0 B3031-002348-00 2016-01-04 #
8 32605 3 B3031-002419-00 2016-01-04 #
9 32605 0 B3031-002348-00 2016-01-05 #
10 32605 3 B3031-002419-00 2016-01-05 #
", header = TRUE, comment.char = "")