按多个属性返回日期值

时间:2016-02-25 13:26:23

标签: r

如果已经在某个地方询问过,我道歉。

我有两个数据帧。

DF1:

Account Name    Account Number  Product Annual Contract Value   Date
Customer A  50261601    Banana  10000   1/1/2015
Customer B  50208388    Orange  50000   2/1/2015
Customer A  55795702    Apple   25000   3/1/2015
Customer C  50217249    Pear    45000   4/1/2015
Customer A  50378835    Orange  12000   5/1/2015
Customer C  55123434    Banana  14000   6/1/2015
Customer A  50438332    Banana  7500    7/1/2015
Customer D  55131817    Peach   5600    8/1/2015
Customer F  53765467    Plum    25000   9/1/2015
Customer E  50990613    Banana  10000   10/1/2015
Customer D  53846150    Orange  18000   11/1/2015
Customer A  50234897    Apple   30000   12/1/2015

和df2:

Date    Product Account Sales
1/1/2015    Apple   55795702    500
2/1/2015    Apple   55795702    1000
3/1/2015    Apple   55795702    1500
4/1/2015    Apple   55795702    1000
5/1/2015    Apple   55795702    2000
6/1/2015    Apple   55795702    2500
7/1/2015    Apple   55795702    3000
8/1/2015    Apple   55795702    1001
9/1/2015    Apple   55795702    3500
10/1/2015   Apple   55795702    4000
11/1/2015   Apple   55795702    4500
12/1/2015   Apple   55795702    1002

我想在R中做什么,我想知道客户是否在给定的时间范围内达到了年度合同价值。因此,对于这个例子,在df1中,客户55795702在3/1签署了一份年度价值25,000美元的Apple合同。我想在df2中找到该客户,为该客户找到该产品,然后返回接下来三个月购买的总和,看看他们是否有望购买年价值25,000美元。

谢谢大家。

1 个答案:

答案 0 :(得分:0)

这可能接近您所寻找的。为了演示的目的,我修改了你的数据。这里有df1df2两个帐号,分别为55131817和55795702.首先,我将数据帧转换为data.table对象。然后,我将Date转换为两个数据集中的Date对象。然后,我使用merge()并获取df1df2中现有帐号的数据。到目前为止,您有两列日期。 Date.x来自df1,另一个来自df2。使用这些列,我确定了合同的开始并获得了接下来三个月的数据。他们签订合同的月份已被删除。如果月份包含在内,则需要修改代码。此操作为您提供子集的行索引。最后的操作是总结每个帐号的数据。我添加了Value,以便您可以跟踪每个客户到目前为止的情况。

library(data.table)

setDT(df1)[, Date := as.Date(Date, format = "%m/%d/%Y")]
setDT(df2)[, Date := as.Date(Date, format = "%m/%d/%Y")]

merge(df1, df2, by.x = "AccountNumber", by.y = "Account")[
    unlist(lapply(which(Date.x == Date.y), function(x){x + 1:3})), ][,
    list(whatever = sum(Sales), Value = Value[1L]), by = "AccountNumber"]

#   AccountNumber whatever Value
#1:      55131817    12000  5600
#2:      55795702     5500 25000

数据

<强> DF1

df1 <- read.table(text = "AccountName    AccountNumber  ProductAnnualContract Value   Date
CustomerA  50261601    Banana  10000   1/1/2015
CustomerB  50208388    Orange  50000   2/1/2015
CustomerA  55795702    Apple   25000   3/1/2015
CustomerC  50217249    Pear    45000   4/1/2015
CustomerA  50378835    Orange  12000   5/1/2015
CustomerC  55123434    Banana  14000   6/1/2015
CustomerA  50438332    Banana  7500    7/1/2015
CustomerD  55131817    Peach   5600    8/1/2015
CustomerF  53765467    Plum    25000   9/1/2015
CustomerE  50990613    Banana  10000   10/1/2015
CustomerD  53846150    Orange  18000   11/1/2015
CustomerA  50234897    Apple   30000   12/1/2015", header = TRUE)


    AccountName AccountNumber ProductAnnualContract Value       Date
 1:   CustomerA      50261601                Banana 10000 2015-01-01
 2:   CustomerB      50208388                Orange 50000 2015-02-01
 3:   CustomerA      55795702                 Apple 25000 2015-03-01
 4:   CustomerC      50217249                  Pear 45000 2015-04-01
 5:   CustomerA      50378835                Orange 12000 2015-05-01
 6:   CustomerC      55123434                Banana 14000 2015-06-01
 7:   CustomerA      50438332                Banana  7500 2015-07-01
 8:   CustomerD      55131817                 Peach  5600 2015-08-01
 9:   CustomerF      53765467                  Plum 25000 2015-09-01
10:   CustomerE      50990613                Banana 10000 2015-10-01
11:   CustomerD      53846150                Orange 18000 2015-11-01
12:   CustomerA      50234897                 Apple 30000 2015-12-01

<强> DF2

df2 <- read.table(text = "Date    Product Account Sales
1/1/2015    Apple   55795702    500
2/1/2015    Apple   55795702    1000
3/1/2015    Apple   55795702    1500
4/1/2015    Apple   55795702    1000
5/1/2015    Apple   55795702    2000
6/1/2015    Apple   55795702    2500
7/1/2015    Apple   55795702    3000
8/1/2015    Peach   55131817    1001
9/1/2015    Peach   55131817    3500
10/1/2015   Peach   55131817    4000
11/1/2015   Peach   55131817    4500
12/1/2015   Apple   55795702    1002", header = TRUE)

          Date Product  Account Sales
 1: 2015-01-01   Apple 55795702   500
 2: 2015-02-01   Apple 55795702  1000
 3: 2015-03-01   Apple 55795702  1500
 4: 2015-04-01   Apple 55795702  1000
 5: 2015-05-01   Apple 55795702  2000
 6: 2015-06-01   Apple 55795702  2500
 7: 2015-07-01   Apple 55795702  3000
 8: 2015-08-01   Peach 55131817  1001
 9: 2015-09-01   Peach 55131817  3500
10: 2015-10-01   Peach 55131817  4000
11: 2015-11-01   Peach 55131817  4500
12: 2015-12-01   Apple 55795702  1002