R中因子和时间变量的数据集操作

时间:2017-06-19 00:46:08

标签: r datetime dataset dplyr apply

我有一个关于数据操作的简单问题。给出以下数据集:

n = c("john","jane","tim","john","jimmy","tim","jane","john","jimmy")
s = c("2012-03-21","2013-02-12","2014-01-01","2012-05-21","2010-12-17","2012-01-21","2013-03-12","2013-08-21","2010-09-17")

df = data.frame(n,s)
     n      s
1  john 2012-03-21
2  jane 2013-02-12
3   tim 2014-01-01
4  john 2012-05-21
5 jimmy 2010-12-17
6   tim 2012-01-21
7  jane 2013-03-12
8  john 2013-08-21
9 jimmy 2010-09-17

我想创建第三列数据,对于每个人,我已经计算了从最早时间点开始的月数。它看起来如下:

         n      s        output
    1  john 2012-03-21     0
    2  jane 2013-02-12     0
    3   tim 2014-01-01     24
    4  john 2012-05-21     2
    5 jimmy 2010-12-17     3
    6   tim 2012-01-21     0
    7  jane 2013-03-12     1
    8  john 2013-08-21    17
    9 jimmy 2010-09-17     0

正如您所看到的,以约翰为例,最早的时间点是2012-03-21,因此它计算了2012-03-21至2012-05-21,然后到2013-08-的月数 - 21并将输出放在适当的行中。

我认为dplyr或应用函数会派上用场,但我发现我正在制作相当多的代码,这些代码应该不会太难。

感谢您的帮助。

2 个答案:

答案 0 :(得分:2)

在我的回答中,我使用lubridate包来确保s中的df列不被视为字符串或因素:

library(dplyr)
library(lubridate)
df$s = as_date(df$s)

为开始日期创建单独的数据框:

df.startdate = df %>% group_by(n) %>% summarise(start_date = min(s))

现在将主df合并到新构建的df.startdate

answer = merge(df, df.startdate, by = "n") %>% 
    mutate(output = interval(start_date, s) %/% months(1))

答案 1 :(得分:2)

我们可以使用dplyr


n = c("john","jane","tim","john","jimmy","tim","jane","john","jimmy")
s = c("2012-03-21","2013-02-12","2014-01-01","2012-05-21","2010-12-17","2012-01-21","2013-03-12","2013-08-21","2010-09-17")
s = as.Date(s)
df = data.frame(n,s)


library(dplyr)

df %>% 
  group_by(n) %>% 
  mutate(out = round(as.integer(difftime(s, s[which.min(s)], units = 'days')) / 30, 0))
#> # A tibble: 9 x 3
#> # Groups:   n [4]
#>        n          s   out
#>   <fctr>     <date> <dbl>
#> 1   john 2012-03-21     0
#> 2   jane 2013-02-12     0
#> 3    tim 2014-01-01    24
#> 4   john 2012-05-21     2
#> 5  jimmy 2010-12-17     3
#> 6    tim 2012-01-21     0
#> 7   jane 2013-03-12     1
#> 8   john 2013-08-21    17
#> 9  jimmy 2010-09-17     0

一如既往,计算月数非常棘手,因为不同的月份有不同的长度。