Calculate the mean of date differences

时间:2018-06-04 16:49:33

标签: pyspark

I have the following data frame containing 2 variables : id_client, and purchase_date :

(10509609,2011-06-04) ;
(10509609,2011-03-14) ;
(10509609,2006-06-03) ;
(10509609,2006-06-03) ;
(10509609,2006-06-03) ;
(10509609,2006-06-03) ;
(10509609,2006-06-03) ;
(10509609,2002-03-09) ;
(10509618,2016-02-22) ;
(10509618,2015-04-15) ;

I have ordered the dataframe by id and ascending date.

I don't manage to create a data frame in which there is one row per distinct id and, for this id, I calculate the mean of successive date differences in number of days.

For instance, for the first id 10509609, it should make (2011-06-04 - 2011-03-14) + (2011-03-14 - 2006-06-03) + ... + (2006-06-03 - 2002-03-09) / 7

0 个答案:

没有答案