计算因子时间变量之间的月份

时间:2010-10-05 17:39:24

标签: r

我有一个如下因素的时间序列:

df <- data.frame(a=c("11-JUL-2004", "11-JUL-2005", "11-JUL-2006", 
                   "11-JUL-2007", "11-JUL-2008"),
                 b=c("11-JUN-1999", "11-JUN-2000", "11-JUN-2001", 
                     "11-JUN-2002", "11-JUN-2003"))

首先,我想将其转换为R的原生格式。其次,我想计算两列之间的月数。

更新

基本上我正在尝试重新创建我在SPSS中做的事情,在R.

在SPSS中我会:

  1. 将字符串转换为日期格式DD-MMM-YYYY
  2. COMPUTE。 RND((A-B)/60/60/24/30.416)
  3. 30.416是365/12的缩写我不太关心月边缘情况,因此舍入操作。

5 个答案:

答案 0 :(得分:4)

df <- data.frame(c("11-JUL-2004","11-JUL-2005","11-JUL-2006","11-JUL-2007","11-JUL-2008"),
                 c("11-JUN-1999","11-JUN-2000","11-JUN-2001","11-JUN-2002","11-JUN-2003"))
names(df) <- c("X1","X2")
df <- within(df, X1 <- as.Date(X1, format = "%d-%b-%Y"))
df <- within(df, X2 <- as.Date(X2, format = "%d-%b-%Y"))

然后difftime()将在几周内给出差异:

> with(df, difftime(X1, X2, units = "weeks"))
Time differences in weeks
[1] 265.2857 265.1429 265.1429 265.1429 265.2857

或者如果我们使用Brandon的近似值:

> with(df, difftime(X1, X2) / 30.416)
Time differences in days
[1] 61.05339 61.02052 61.02052 61.02052 61.05339

最接近我可以使用lubridate(由Dirk强调)(使用上面的df

> m <- with(df, as.period(subtract_dates(X1, X2)))
> m
[1] 5 years and 1 month   5 years and 1 month   5 years and 1 month   5 years and 1 month   5 years and 1 month
> str(m)
Classes ‘period’ and 'data.frame':  5 obs. of  6 variables:
 $ year  : int  5 5 5 5 5
 $ month : int  1 1 1 1 1
 $ day   : num  0 0 0 0 0
 $ hour  : int  0 0 0 0 0
 $ minute: int  0 0 0 0 0
 $ second: num  0 0 0 0 0

答案 1 :(得分:3)

Josh是关于一个月可能意味着什么的困难的观点。 lubridate包有一些答案。

就基数R而言,我们可以回答数周:

> df[,"pa"] <- as.POSIXct(strptime(as.character(df$a),
+                         format="%d-%B-%Y", tz="GMT"))
> df[,"pb"] <- as.POSIXct(strptime(as.character(df$b),
+                         format="%d-%B-%Y",tz="GMT"))
> df[,"weeks"] <- difftime(df$pa, df$pb, unit="weeks")
> df[,"months"] <- difftime(df$pa, df$pb, unit="days")/30.416
> df
            a           b         pa         pb        weeks      months
1 11-JUL-2004 11-JUN-1999 2004-07-11 1999-06-11 265.29 weeks 61.053 days
2 11-JUL-2005 11-JUN-2000 2005-07-11 2000-06-11 265.14 weeks 61.021 days
3 11-JUL-2006 11-JUN-2001 2006-07-11 2001-06-11 265.14 weeks 61.021 days
4 11-JUL-2007 11-JUN-2002 2007-07-11 2002-06-11 265.14 weeks 61.021 days
5 11-JUL-2008 11-JUN-2003 2008-07-11 2003-06-11 265.29 weeks 61.053 days
> 

根据我的编辑使用更改的data.frame,以便我们拥有正确的列名。如果你在as.numeric()周围抛出difftime(),你也会获得数字。

答案 2 :(得分:3)

布兰登,

你可以用lubridate包来做到这一点。

> library(lubridate)

通知R这些是日期。使用dmy()解析器函数,因为日期是日,月,年(即dmy)。

> df <- transform(df, a = dmy(a), b = dmy(b))

将差异计算为期间。这将为您提供整年,月,日等的数量。

> diff <- as.period(df$a - df$b)

使用数学将结果转换为几个月。

> 12* diff$year + diff$month

这些都是相隔61个月。这将把它放到最近的月份。如果你想根据你可以做

这样的天数来舍入
> 12* diff$year + diff$month + round(diff$day/30)

我正在努力在下一版本的lubridate中使这些步骤更容易/更直观。

答案 3 :(得分:2)

> Data <- data.frame(
+ V1=c("11-JUL-2004","11-JUL-2005","11-JUL-2006","11-JUL-2007","11-JUL-2008"),
+ V2=c("11-JUN-1999","11-JUN-2000","11-JUN-2001","11-JUN-2002","11-JUN-2003"))
> Data[,1] <- as.Date(Data[,1],"%d-%b-%Y")
> Data[,2] <- as.Date(Data[,2],"%d-%b-%Y")
> # Assuming 30 days per month
> (Data[,1]-Data[,2])/30
Time differences in days
[1] 61.90000 61.86667 61.86667 61.86667 61.90000
> # Assuming 30.416 days per month
> (Data[,1]-Data[,2])/30.416
Time differences in days
[1] 61.05339 61.02052 61.02052 61.02052 61.05339
> # Assuming month crosses
> require(zoo)
> Data[,1] <- as.yearmon(Data[,1])
> Data[,2] <- as.yearmon(Data[,2])
> (Data[,1]-Data[,2])*12
[1] 61 61 61 61 61

答案 4 :(得分:2)

下面的数字1似乎与您要求的最接近,但是根据您的目的,您可能还需要考虑2和3。如果你想考虑一个小数的月份,也可以尝试数字1和3而不进行舍入。

# first convert columns of df to "Date" class
df[] <- lapply(df, as.Date, "%d-%b-%Y")

# 1. difference in days divided by 365.25/12
with(df, round((as.numeric(a) - as.numeric(b)) / (365.25/12)))

# 2. convert to 1st of month & then take diff in mos
library(zoo)
with(df, 12 * (as.yearmon(a) - as.yearmon(b)))

# 3. business style difference in months. See: ?"mondate-class"
library(mondate)
with(df, round(as.numeric(mondate(a) - mondate(b))))