处理R中的日期

时间:2014-12-12 15:19:29

标签: r date

提前感谢您的帮助。我试图做一些我确定很简单但我无法弄清楚的事情。我想在R中做一个散点图,其中x轴为日期,y轴为相对频率。问题是,它都聚集在x轴上的某些点上,因为R没有意识到数字是日期。日期格式为yearmonthset,其中set是该月份的第0天或第1天或第2天。所以12071是2012年7月1日的定位。我需要R才能意识到一个月有3套,一年有12个月,有4年和空间的散点图。什么是解决这个问题的最佳方法?

以下是我的数据片段:

ID,Date,Trigram,Freq,Relfreq
TPN,12071,a constitutional convention,6,0.00001211467753757064371339095882
TPN,12111,a constitutional convention,2,0.000003302558987831721409334022467
TPN,11071,a constitutional convention,6,0.00001211467753757064371339095882
TPN,11111,a constitutional convention,2,0.000003302558987831721409334022467
TPN,10071,a constitutional convention,6,0.00001211467753757064371339095882
TPN,10111,a constitutional convention,2,0.000003302558987831721409334022467
TPN,09071,a constitutional convention,6,0.00001211467753757064371339095882
TPN,09111,a constitutional convention,2,0.000003302558987831721409334022467
CR,10032,a constitutional convention,3,0.000001049388049359016289650690200
CR,10062,a constitutional convention,2,7.020490002120187980640296770E-7

我尝试使用本网站http://www.statmethods.net/input/dates.html所述的as.date(),但我真的没有。

> strdates <- origina_NoCon$Date
> dates <- as.Date(strdates, %y%m)
Error: unexpected SPECIAL in "dates <- as.Date(strdates, %y%"

修改:

这是dput输出的一部分(strdates):

> dput(strdates)
c(12071L, 12111L, 11071L, 11111L, 10071L, 10111L, 9071L, 9111L, 
10032L, 10062L, 11041L, 11071L, 11111L, 11121L, 12020L, 12021L, 
12102L, 12110L, 12111L, 11021L,...)

1 个答案:

答案 0 :(得分:1)

让我们先根据需要用零填充将您的strdates更改为character

strdates.chr <- sprintf("%05i",strdates)

您现在可以首先重新格式化它们以代表每个月的第一个,然后将其转换为Date

> as.Date(paste0(substr(strdates.chr,1,4),"01"),format="%y%m%d")
 [1] "2012-07-01" "2012-11-01" "2011-07-01" "2011-11-01" "2010-07-01"
 [6] "2010-11-01" "2009-07-01" "2009-11-01" "2010-03-01" "2010-06-01"
[11] "2011-04-01" "2011-07-01" "2011-11-01" "2011-12-01" "2012-02-01"
[16] "2012-02-01" "2012-10-01" "2012-11-01" "2012-11-01" "2011-02-01"

为了包含最后一条信息,提取(substr),转换为numeric,最后添加10天的适当倍数(将整数添加到Date s将自动解释为添加若干天):

> as.Date(paste0(substr(strdates.chr,1,4),"01"),format="%y%m%d")+
+ as.numeric(substr(strdates.chr,5,5))*10
 [1] "2012-07-11" "2012-11-11" "2011-07-11" "2011-11-11" "2010-07-11"
 [6] "2010-11-11" "2009-07-11" "2009-11-11" "2010-03-21" "2010-06-21"
[11] "2011-04-11" "2011-07-11" "2011-11-11" "2011-12-11" "2012-02-01"
[16] "2012-02-11" "2012-10-21" "2012-11-01" "2012-11-11" "2011-02-11"