> sales.csv
CustomerName InvoiceDate_Time InvoiceNo InvoiceValue
1 Hendricks, Eric 30-09-2015 1:00 PM 10 5000
2 Baker, Mark 30-09-2015 3:00 PM 11 12000
3 Catalano, Robert 01-10-2015 10:00 AM 12 25000
4 Eaton, Jeffrey 01-10-2015 4:00 PM 13 4000
5 Watanuki, Cathy 02-10-2015 9:00 AM 14 80000
6 Fier, Marilyn 02-10-2015 3:30 PM 15 18000
7 O'Brien, Donna 03-10-2015 1:30 PM 16 25000
8 Perez, Barney 03-10-2015 4:10 PM 17 20000
9 Fitzgerald, Jackie 04-10-2015 11:10 AM 18 6000
> StaffAttendance.csv
EmployeeName Designation AttendanceIn.DateTime AttendanceOut.DateTime
1 Page, Lisa Sales Rep 30-09-2015 6:50 AM 30-09-2015 2:00 PM
2 Taylor, Hector Manager 30-09-2015 7:00 AM 30-09-2015 5:00 PM
3 Dawson, Jonathan Sales Rep 30-09-2015 1:55 PM 30-09-2015 7:00 PM
4 Duran, Brian Sales Rep 01-10-2015 6:50 AM 01-10-2015 7:00 PM
5 Pratt, Erik Manager 01-10-2015 7:20 AM 01-10-2015 5:10 PM
6 Page, Lisa Sales Rep 02-10-2015 6:55 AM 02-10-2015 6:45 PM
7 Taylor, Hector Manager 02-10-2015 7:10 AM 02-10-2015 5:20 AM
8 Weber, Larry Sales Rep 03-10-2015 6:50 AM 03-10-2015 6:55 PM
9 Pratt, Erik Manager 04-10-2015 7:20 AM 04-10-2015 5:10 PM
10 Duran, Brian Sales Rep 04-10-2015 7:10 AM 04-10-2015 7:00 PM
如上所述我有两个数据表(CSV文件),我想用日期和时间组合。时间。 如何结合使用日期和时间时间,找出哪些员工为每次销售给客户工作?
如何将结果表保存为CSV文件?
PLS。说明要逐步使用的R命令。 我也可以在画面中这样做。步骤是什么?
答案 0 :(得分:0)
好的,这是一个潜在的dplyr
/ data.table
/ tidyr
解决方案。
一般的想法是使用list variable feature of dplyr since version 0.4.0。对于每位客户,我们选择在访问时出席的员工(使用data.table
的{{1}}功能)并将其存储在每个客户的列表中。然后我们between()
列表变量(复制每个唯一员工的客户条目)并合并回员工信息。这导致了一个独特的客户 - 员工组合的数据框架。
unnest()
这导致下表(仅显示前10行):
library(dplyr)
library(readr)
library(tidyr)
library(data.table)
#########
# For reproducibility: you can also download the .csv
# from these Dropbox links using the 'repmis' pkg
#
# customer <- repmis::source_DropboxData("customer.csv",
# "q0sf4uj13hpjz9v",
# sep = ",",
# header = TRUE)
#
# staff <- repmis::source_DropboxData("staff.csv",
# "q8p16hchsx8dzoa",
# sep = ",",
# header = TRUE)
##########
# One problem with the original .csv is the formatting of the time: the
# hour is given with a single digit; not in the format 0+digit. We therefore
# use '%k' in as.POSIXct() to parse the time instead of %H:
customer <- read_csv("https://www.dropbox.com/s/q8p16hchsx8dzoa/staff.csv?dl=1") %>%
mutate(date = as.POSIXct(date, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))
staff <- read_csv("staff.csv") %>%
mutate(start = as.POSIXct(start, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"),
end = as.POSIXct(end, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))
# Now we group by customer and copy for each customer
# the list of employee names who were present at the date of the customer interaction:
staff_customer <- customer %>%
group_by(c.name) %>% # for each customer....
mutate(employee = list(staff[data.table::between(date, staff$start, staff$end), c("employee", "Record ID")])) %>% # ... select all employees which were present during the customer's visit and store them in a list
unnest() %>% # unnest this list using tidyr
left_join(., staff) # copy the staff information back (if necessary)
不幸的是,我不知道这在Tableau中是如何工作的。