根据其他条件

时间:2016-12-01 20:09:55

标签: r

我有一张订单表和一个帐户活动表。我希望找到该订单帐户中的最新活动。我想迭代每个订单,找到与帐户匹配的活动,以及最接近订单日期的日期。

rec   type      date         account   nearest.rec
1     Order     12/1/2016    A
2     Order     11/14/2016   B
3     Activity  11/13/2016   A
4     Activity  10/15/2016   C
5     Order     11/13/2016   C
6     Activity  11/16/2016   A
7     Activity  11/17/2016   A
8     Activity  10/14/2016   B
9     Activity  11/4/2016    B

想把它变成这个:

rec   type      date         account   nearest.rec.actv
1     Order     12/1/2016    A         7
2     Order     11/14/2016   B         9
3     Activity  11/13/2016   A
4     Activity  10/15/2016   C
5     Order     11/13/2016   C         4
6     Activity  11/16/2016   A
7     Activity  11/17/2016   A
8     Activity  10/14/2016   B
9     Activity  11/4/2016    B

或转变为自己的数据框

rec   type      date         account   nearest.rec.actv  actv.date
1     Order     12/1/2016    A         7                 11/17/2016
2     Order     11/14/2016   B         9                 11/4/2016
5     Order     11/13/2016   C         4                 10/15/2016

3 个答案:

答案 0 :(得分:1)

type拆分数据,然后按account合并,然后汇总

df$date <- as.Date(df$date, "%m/%d/%Y")
ind <- df$type=="Order"
df1 <- df[ind,]
df2 <- df[!ind,]
left_join(df1, df2, by="account") %>% 
  group_by(account) %>% 
  filter( date.x - date.y == min(date.x-date.y))

#  rec.x type.x     date.x account rec.y   type.y     date.y
#  <int>  <chr>     <date>   <chr> <int>    <chr>     <date>
#1     1  Order 2016-12-01       A     7 Activity 2016-11-17
#2     2  Order 2016-11-14       B     9 Activity 2016-11-04
#3     5  Order 2016-11-13       C     4 Activity 2016-10-15

答案 1 :(得分:0)

这不是一个有效的答案,但逐步执行可能会有所帮助:

# subset into 2 dataframes
df1 <- df[df$type == "Order",]
df2 <- df[df$type == "Activity",]

# basic logic in the mutate() is that get the time difference for each record in a account. find the minimum, and get the corresponding activity date and record
df1 %>% group_by(account) %>% 
  mutate(x = df2$date[account==df2$account][which(min(difftime(date, df2$date[account == df2$account])) == difftime(date, df2$date[account == df2$account]))],
         y = df2$rec[account==df2$account][which(min(difftime(date, df2$date[account == df2$account])) == difftime(date, df2$date[account == df2$account]))])

#    rec  type       date account          x     y
#  <int> <chr>     <date>   <chr>     <date> <int>
#1     1 Order 2016-12-01       A 2016-11-17     7
#2     2 Order 2016-11-14       B 2016-11-04     9
#3     5 Order 2016-11-13       C 2016-10-15     4

答案 2 :(得分:0)

这是我使用DECLARE @ID INTEGER SELECT @ID = 2 --enter ID you are looking for here IF EXISTS (SELECT TOP(1) ID, FieldName FROM MyTable WHERE ID = @ID) BEGIN SELECT ID, FieldName FROM MyTable WHERE ID = @ID END ELSE BEGIN SELECT UserNote = 'No records match your search.' END dplyr的解决方案:

purrr