如果已经在某个地方询问过,我道歉。
我有两个数据帧。
DF1:
Account Name Account Number Product Annual Contract Value Date
Customer A 50261601 Banana 10000 1/1/2015
Customer B 50208388 Orange 50000 2/1/2015
Customer A 55795702 Apple 25000 3/1/2015
Customer C 50217249 Pear 45000 4/1/2015
Customer A 50378835 Orange 12000 5/1/2015
Customer C 55123434 Banana 14000 6/1/2015
Customer A 50438332 Banana 7500 7/1/2015
Customer D 55131817 Peach 5600 8/1/2015
Customer F 53765467 Plum 25000 9/1/2015
Customer E 50990613 Banana 10000 10/1/2015
Customer D 53846150 Orange 18000 11/1/2015
Customer A 50234897 Apple 30000 12/1/2015
和df2:
Date Product Account Sales
1/1/2015 Apple 55795702 500
2/1/2015 Apple 55795702 1000
3/1/2015 Apple 55795702 1500
4/1/2015 Apple 55795702 1000
5/1/2015 Apple 55795702 2000
6/1/2015 Apple 55795702 2500
7/1/2015 Apple 55795702 3000
8/1/2015 Apple 55795702 1001
9/1/2015 Apple 55795702 3500
10/1/2015 Apple 55795702 4000
11/1/2015 Apple 55795702 4500
12/1/2015 Apple 55795702 1002
我想在R中做什么,我想知道客户是否在给定的时间范围内达到了年度合同价值。因此,对于这个例子,在df1中,客户55795702在3/1签署了一份年度价值25,000美元的Apple合同。我想在df2中找到该客户,为该客户找到该产品,然后返回接下来三个月购买的总和,看看他们是否有望购买年价值25,000美元。
谢谢大家。
答案 0 :(得分:0)
这可能接近您所寻找的。为了演示的目的,我修改了你的数据。这里有df1
和df2
两个帐号,分别为55131817和55795702.首先,我将数据帧转换为data.table对象。然后,我将Date转换为两个数据集中的Date对象。然后,我使用merge()
并获取df1
和df2
中现有帐号的数据。到目前为止,您有两列日期。 Date.x
来自df1,另一个来自df2。使用这些列,我确定了合同的开始并获得了接下来三个月的数据。他们签订合同的月份已被删除。如果月份包含在内,则需要修改代码。此操作为您提供子集的行索引。最后的操作是总结每个帐号的数据。我添加了Value
,以便您可以跟踪每个客户到目前为止的情况。
library(data.table)
setDT(df1)[, Date := as.Date(Date, format = "%m/%d/%Y")]
setDT(df2)[, Date := as.Date(Date, format = "%m/%d/%Y")]
merge(df1, df2, by.x = "AccountNumber", by.y = "Account")[
unlist(lapply(which(Date.x == Date.y), function(x){x + 1:3})), ][,
list(whatever = sum(Sales), Value = Value[1L]), by = "AccountNumber"]
# AccountNumber whatever Value
#1: 55131817 12000 5600
#2: 55795702 5500 25000
数据强>
<强> DF1 强>
df1 <- read.table(text = "AccountName AccountNumber ProductAnnualContract Value Date
CustomerA 50261601 Banana 10000 1/1/2015
CustomerB 50208388 Orange 50000 2/1/2015
CustomerA 55795702 Apple 25000 3/1/2015
CustomerC 50217249 Pear 45000 4/1/2015
CustomerA 50378835 Orange 12000 5/1/2015
CustomerC 55123434 Banana 14000 6/1/2015
CustomerA 50438332 Banana 7500 7/1/2015
CustomerD 55131817 Peach 5600 8/1/2015
CustomerF 53765467 Plum 25000 9/1/2015
CustomerE 50990613 Banana 10000 10/1/2015
CustomerD 53846150 Orange 18000 11/1/2015
CustomerA 50234897 Apple 30000 12/1/2015", header = TRUE)
AccountName AccountNumber ProductAnnualContract Value Date
1: CustomerA 50261601 Banana 10000 2015-01-01
2: CustomerB 50208388 Orange 50000 2015-02-01
3: CustomerA 55795702 Apple 25000 2015-03-01
4: CustomerC 50217249 Pear 45000 2015-04-01
5: CustomerA 50378835 Orange 12000 2015-05-01
6: CustomerC 55123434 Banana 14000 2015-06-01
7: CustomerA 50438332 Banana 7500 2015-07-01
8: CustomerD 55131817 Peach 5600 2015-08-01
9: CustomerF 53765467 Plum 25000 2015-09-01
10: CustomerE 50990613 Banana 10000 2015-10-01
11: CustomerD 53846150 Orange 18000 2015-11-01
12: CustomerA 50234897 Apple 30000 2015-12-01
<强> DF2 强>
df2 <- read.table(text = "Date Product Account Sales
1/1/2015 Apple 55795702 500
2/1/2015 Apple 55795702 1000
3/1/2015 Apple 55795702 1500
4/1/2015 Apple 55795702 1000
5/1/2015 Apple 55795702 2000
6/1/2015 Apple 55795702 2500
7/1/2015 Apple 55795702 3000
8/1/2015 Peach 55131817 1001
9/1/2015 Peach 55131817 3500
10/1/2015 Peach 55131817 4000
11/1/2015 Peach 55131817 4500
12/1/2015 Apple 55795702 1002", header = TRUE)
Date Product Account Sales
1: 2015-01-01 Apple 55795702 500
2: 2015-02-01 Apple 55795702 1000
3: 2015-03-01 Apple 55795702 1500
4: 2015-04-01 Apple 55795702 1000
5: 2015-05-01 Apple 55795702 2000
6: 2015-06-01 Apple 55795702 2500
7: 2015-07-01 Apple 55795702 3000
8: 2015-08-01 Peach 55131817 1001
9: 2015-09-01 Peach 55131817 3500
10: 2015-10-01 Peach 55131817 4000
11: 2015-11-01 Peach 55131817 4500
12: 2015-12-01 Apple 55795702 1002