我有一个如下因素的时间序列:
df <- data.frame(a=c("11-JUL-2004", "11-JUL-2005", "11-JUL-2006",
"11-JUL-2007", "11-JUL-2008"),
b=c("11-JUN-1999", "11-JUN-2000", "11-JUN-2001",
"11-JUN-2002", "11-JUN-2003"))
首先,我想将其转换为R的原生格式。其次,我想计算两列之间的月数。
基本上我正在尝试重新创建我在SPSS中做的事情,在R.
在SPSS中我会:
30.416是365/12的缩写我不太关心月边缘情况,因此舍入操作。
答案 0 :(得分:4)
df <- data.frame(c("11-JUL-2004","11-JUL-2005","11-JUL-2006","11-JUL-2007","11-JUL-2008"),
c("11-JUN-1999","11-JUN-2000","11-JUN-2001","11-JUN-2002","11-JUN-2003"))
names(df) <- c("X1","X2")
df <- within(df, X1 <- as.Date(X1, format = "%d-%b-%Y"))
df <- within(df, X2 <- as.Date(X2, format = "%d-%b-%Y"))
然后difftime()
将在几周内给出差异:
> with(df, difftime(X1, X2, units = "weeks"))
Time differences in weeks
[1] 265.2857 265.1429 265.1429 265.1429 265.2857
或者如果我们使用Brandon的近似值:
> with(df, difftime(X1, X2) / 30.416)
Time differences in days
[1] 61.05339 61.02052 61.02052 61.02052 61.05339
最接近我可以使用lubridate(由Dirk强调)(使用上面的df
)
> m <- with(df, as.period(subtract_dates(X1, X2)))
> m
[1] 5 years and 1 month 5 years and 1 month 5 years and 1 month 5 years and 1 month 5 years and 1 month
> str(m)
Classes ‘period’ and 'data.frame': 5 obs. of 6 variables:
$ year : int 5 5 5 5 5
$ month : int 1 1 1 1 1
$ day : num 0 0 0 0 0
$ hour : int 0 0 0 0 0
$ minute: int 0 0 0 0 0
$ second: num 0 0 0 0 0
答案 1 :(得分:3)
Josh是关于一个月可能意味着什么的困难的观点。 lubridate包有一些答案。
就基数R而言,我们可以回答数周:
> df[,"pa"] <- as.POSIXct(strptime(as.character(df$a),
+ format="%d-%B-%Y", tz="GMT"))
> df[,"pb"] <- as.POSIXct(strptime(as.character(df$b),
+ format="%d-%B-%Y",tz="GMT"))
> df[,"weeks"] <- difftime(df$pa, df$pb, unit="weeks")
> df[,"months"] <- difftime(df$pa, df$pb, unit="days")/30.416
> df
a b pa pb weeks months
1 11-JUL-2004 11-JUN-1999 2004-07-11 1999-06-11 265.29 weeks 61.053 days
2 11-JUL-2005 11-JUN-2000 2005-07-11 2000-06-11 265.14 weeks 61.021 days
3 11-JUL-2006 11-JUN-2001 2006-07-11 2001-06-11 265.14 weeks 61.021 days
4 11-JUL-2007 11-JUN-2002 2007-07-11 2002-06-11 265.14 weeks 61.021 days
5 11-JUL-2008 11-JUN-2003 2008-07-11 2003-06-11 265.29 weeks 61.053 days
>
根据我的编辑使用更改的data.frame
,以便我们拥有正确的列名。如果你在as.numeric()
周围抛出difftime()
,你也会获得数字。
答案 2 :(得分:3)
布兰登,
你可以用lubridate包来做到这一点。
> library(lubridate)
通知R这些是日期。使用dmy()解析器函数,因为日期是日,月,年(即dmy)。
> df <- transform(df, a = dmy(a), b = dmy(b))
将差异计算为期间。这将为您提供整年,月,日等的数量。
> diff <- as.period(df$a - df$b)
使用数学将结果转换为几个月。
> 12* diff$year + diff$month
这些都是相隔61个月。这将把它放到最近的月份。如果你想根据你可以做
这样的天数来舍入> 12* diff$year + diff$month + round(diff$day/30)
我正在努力在下一版本的lubridate中使这些步骤更容易/更直观。
答案 3 :(得分:2)
> Data <- data.frame(
+ V1=c("11-JUL-2004","11-JUL-2005","11-JUL-2006","11-JUL-2007","11-JUL-2008"),
+ V2=c("11-JUN-1999","11-JUN-2000","11-JUN-2001","11-JUN-2002","11-JUN-2003"))
> Data[,1] <- as.Date(Data[,1],"%d-%b-%Y")
> Data[,2] <- as.Date(Data[,2],"%d-%b-%Y")
> # Assuming 30 days per month
> (Data[,1]-Data[,2])/30
Time differences in days
[1] 61.90000 61.86667 61.86667 61.86667 61.90000
> # Assuming 30.416 days per month
> (Data[,1]-Data[,2])/30.416
Time differences in days
[1] 61.05339 61.02052 61.02052 61.02052 61.05339
> # Assuming month crosses
> require(zoo)
> Data[,1] <- as.yearmon(Data[,1])
> Data[,2] <- as.yearmon(Data[,2])
> (Data[,1]-Data[,2])*12
[1] 61 61 61 61 61
答案 4 :(得分:2)
下面的数字1似乎与您要求的最接近,但是根据您的目的,您可能还需要考虑2和3。如果你想考虑一个小数的月份,也可以尝试数字1和3而不进行舍入。
# first convert columns of df to "Date" class
df[] <- lapply(df, as.Date, "%d-%b-%Y")
# 1. difference in days divided by 365.25/12
with(df, round((as.numeric(a) - as.numeric(b)) / (365.25/12)))
# 2. convert to 1st of month & then take diff in mos
library(zoo)
with(df, 12 * (as.yearmon(a) - as.yearmon(b)))
# 3. business style difference in months. See: ?"mondate-class"
library(mondate)
with(df, round(as.numeric(mondate(a) - mondate(b))))