日期框架中的日期差异

时间:2017-01-27 11:29:26

标签: r

我需要知道2天后有多少访客从未来过。这是对第一次访客的分析。我从7月到12月有6个月的时间段,在那个时间段内某个访问号码= 1的人被认为是第一次访问者。

假设我有以下简单的数据框:

<?php

$query = $this->db->distinct()
        ->select('a.user_name')
        ->from('wl_customers as a'); 
        ->join('tbl_bid as b','a.customers_id=b.customers_id');
        ->join('tbl_portfolio as c','b.portfolio_id=c.portfolio_id')
        ->where('c.portfolio_id',16)
        ->get();
print_r($query->result_array);//array of your records

我怎样才能知道2天后有多少第一次访客没来过?

在我的简单例子中,第一次访问者在2天之后从未来过,似乎是UserID 1,因为自2016年7月2日以来他从未来过两天。

2 个答案:

答案 0 :(得分:1)

library(lubridate)

a <- data.frame("Date"=c("July 1, 2016","July 1, 2016","July 1, 2016","July 2, 2016","July 2, 2016","July 3, 2016","July 3, 2016","July 3, 2016",
                                    "July 4, 2016","July 5, 2016","July 6, 2016"),
                                 "UserID"=c(1, 1, 2, 3, 1, 3, 2, 2, 2, 3, 3),
                                 "Visit No"=c(1, 2, 1, 1, 1, 4, 1, 1, 6, 7, 20))

a$ParsedDate <- strptime(a$Date,"%B %d, %Y",tz = "UTC")

**creating the variable with unique UserIDs to run the loop**

d <- unique(a$UserID)

for(i in 1:length(d))
{
 #DF per UserID
 adfPerUser <-  a[a$UserID == d[i],]

 #now create the interval variable
 intervallistvar <- as.interval(min(adfPerUser$ParsedDate) + 2*24*60*60, max(adfPerUser$ParsedDate))

 #DF for the UserID[i] for the two days
 adfPerUser2days <- adfPerUser[adfPerUser$ParsedDate %within% intervallistvar,]

 if(nrow(adfPerUser2days) >= 1)
 {
   cat(sprintf("User ID = %d and has visited atleast once after two days from the first time visit\n", i))
 }
}

立即查看输出:

output

答案 1 :(得分:0)

library(dplyr)
library(lubridate)


dt <- data.frame("Date"=c("July 1, 2016","July 1, 2016","July 1, 2016","July 2, 2016","July 2, 2016","July 3, 2016","July 3, 2016","July 3, 2016",
                         "July 4, 2016","July 5, 2016","July 6, 2016"),
                "UserID"=c(1, 1, 2, 3, 1, 3, 2, 2, 2, 3, 3),
                "Visit No"=c(1, 2, 1, 1, 1, 4, 1, 1, 6, 7, 20))

dt %>%
  mutate(Date = mdy(Date)) %>%     # update to date format
  group_by(UserID) %>%             # for each user id
  mutate(Date_Next = lead(Date, default=max(mdy(dt$Date))),    # get date of next visit. if there's no next visit consider the latest date in the dataset
         Date_Diff = as.numeric(difftime(Date_Next, Date, units="days"))) %>%    # calculate difference between dates
  ungroup() %>%                    # forget the grouping
  filter(Date_Diff > 2)            # return cases where difference is more than 2 days

# # A tibble: 1 × 5
#           Date UserID Visit.No  Date_Next Date_Diff
#          <date>  <dbl>    <dbl>     <date>     <dbl>
#   1 2016-07-02      1        1 2016-07-06         4

该过程将返回用户2天后未返回的CASES,而不是USERS。如果用户在3天以上反复返回,您可能需要从此输出中获取唯一的用户ID。