使用日期和时间组合两个数据集。时间(员工出勤的销售数据)

时间:2015-11-09 16:18:52

标签: r csv tableau

> sales.csv
        CustomerName    InvoiceDate_Time InvoiceNo InvoiceValue
1    Hendricks, Eric  30-09-2015 1:00 PM        10         5000
2        Baker, Mark  30-09-2015 3:00 PM        11        12000
3   Catalano, Robert 01-10-2015 10:00 AM        12        25000
4     Eaton, Jeffrey  01-10-2015 4:00 PM        13         4000
5    Watanuki, Cathy  02-10-2015 9:00 AM        14        80000
6      Fier, Marilyn  02-10-2015 3:30 PM        15        18000
7     O'Brien, Donna  03-10-2015 1:30 PM        16        25000
8      Perez, Barney  03-10-2015 4:10 PM        17        20000
9 Fitzgerald, Jackie 04-10-2015 11:10 AM        18         6000


> StaffAttendance.csv
       EmployeeName Designation AttendanceIn.DateTime AttendanceOut.DateTime
1        Page, Lisa   Sales Rep    30-09-2015 6:50 AM     30-09-2015 2:00 PM
2    Taylor, Hector     Manager    30-09-2015 7:00 AM     30-09-2015 5:00 PM
3  Dawson, Jonathan   Sales Rep    30-09-2015 1:55 PM     30-09-2015 7:00 PM
4      Duran, Brian   Sales Rep    01-10-2015 6:50 AM     01-10-2015 7:00 PM
5       Pratt, Erik     Manager    01-10-2015 7:20 AM     01-10-2015 5:10 PM
6        Page, Lisa   Sales Rep    02-10-2015 6:55 AM     02-10-2015 6:45 PM
7    Taylor, Hector     Manager    02-10-2015 7:10 AM     02-10-2015 5:20 AM
8      Weber, Larry   Sales Rep    03-10-2015 6:50 AM     03-10-2015 6:55 PM
9       Pratt, Erik     Manager    04-10-2015 7:20 AM     04-10-2015 5:10 PM
10     Duran, Brian   Sales Rep    04-10-2015 7:10 AM     04-10-2015 7:00 PM

如上所述我有两个数据表(CSV文件),我想用日期和时间组合。时间。 如何结合使用日期和时间时间,找出哪些员工为每次销售给客户工作?

如何将结果表保存为CSV文件?

PLS。说明要逐步使用的R命令。 我也可以在画面中这样做。步骤是什么?

1 个答案:

答案 0 :(得分:0)

好的,这是一个潜在的dplyr / data.table / tidyr解决方案。 一般的想法是使用list variable feature of dplyr since version 0.4.0。对于每位客户,我们选择在访问时出席的员工(使用data.table的{​​{1}}功能)并将其存储在每个客户的列表中。然后我们between()列表变量(复制每个唯一员工的客户条目)并合并回员工信息。这导致了一个独特的客户 - 员工组合的数据框架。

unnest()

这导致下表(仅显示前10行):

library(dplyr)
library(readr)
library(tidyr)
library(data.table)

#########
# For reproducibility: you can also download the .csv 
# from these Dropbox links using the 'repmis' pkg 
#
# customer <- repmis::source_DropboxData("customer.csv",
#                            "q0sf4uj13hpjz9v",
#                            sep = ",",
#                            header = TRUE)
# 
# staff <- repmis::source_DropboxData("staff.csv",
#                                     "q8p16hchsx8dzoa",
#                                     sep = ",",
#                                     header = TRUE)
##########    

# One problem with the original .csv is the formatting of the time: the
# hour is given with a single digit; not in the format 0+digit. We therefore 
# use '%k' in as.POSIXct() to parse the time instead of %H:

customer <- read_csv("https://www.dropbox.com/s/q8p16hchsx8dzoa/staff.csv?dl=1") %>% 
  mutate(date = as.POSIXct(date, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))

staff <- read_csv("staff.csv")  %>% 
  mutate(start = as.POSIXct(start, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"),
         end = as.POSIXct(end, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))

# Now we group by customer and copy for each customer 
# the list of employee names who were present at the date of the customer interaction:

staff_customer <- customer %>% 
  group_by(c.name) %>% # for each customer....
  mutate(employee = list(staff[data.table::between(date, staff$start, staff$end), c("employee", "Record ID")])) %>% # ... select all employees which were present during the customer's visit and store them in a list
  unnest() %>% # unnest this list using tidyr
  left_join(., staff) # copy the staff information back (if necessary)

不幸的是,我不知道这在Tableau中是如何工作的。