R

时间:2016-03-15 14:48:41

标签: r rsqlite

我找到了SAS的这个SQL代码,我想把它翻译成RSQL Lite。

proc sql;
create table crspcomp as
select a.*, b.ret, b.date
from ccm1 as a left join crsp.msf as b
on a.permno=b.permno
and intck('month',a.datadate,b.date)
between 3 and 14;
quit;

发生的第一个问题是R没有提供intck函数,它返回两个日期之间的月份差异。我找到了一个类似的函数(在stackoverflow),它看起来像这样:

mob<-function (begin, end) {
  begin<-paste(substr(begin,1,6),"01",sep="")
  end<-paste(substr(end,1,6),"01",sep="")
  mob1<-as.period(interval(ymd(begin),ymd(end)))
  mob<-mob1@year*12+mob1@month
  mob
}

我已经在RSQL之外测试了mob功能,到目前为止工作正常。现在我想把mob函数放到上面写的SQL语句中。 在SQL代码中我想合并permno上的数据,另外我想要将数据滞后3个月(这就是我使用mob函数的原因)。

Annual_File看起来像这样:

GVKEY,datadate,fyear,fyr,bkvlps,permno
14489,19980131,1997,1,4.0155,11081
14489,19990131,1998,1,1.8254,11081
14489,20000131,1999,1,2.0614,11081
14489,20010131,2000,1,2.1615,11081
14489,20020131,2001,1,1.804,11081

CRSP文件如下所示

permno,date,ret
11081,20000103,0.1
11081,20000104,0.2
install.packages('DBI')
install.packages('RSQLite')


mob<-function (begin, end) {
  begin<-paste(substr(begin,1,6),"01",sep="")
  end<-paste(substr(end,1,6),"01",sep="")
  mob1<-as.period(interval(ymd(begin),ymd(end)))
  mob<-mob1@year*12+mob1@month
  mob
}

Annual_File <- "C:/Users/XYZ"
Annual_File  <- paste0(Annual_File ,".csv",sep="")

 inputFile <- "C:/Users/XYZ"
 inputFile <- paste0(inputFile.csv",sep="")

con <- dbConnect(RSQLite::SQLite(), dbname='CCM')

dbWriteTable(con, name="CRSP", value=inputFile, row.names=FALSE, header=TRUE, overwrite=TRUE)
dbWriteTable(con, name="Annual_File", value=Annual_File, row.names=FALSE, header=TRUE, overwrite=TRUE)



 DSQL <- "select a.*, b.ret, b.date 
          from Annual_File as a left join
          CRSP as b
          on a.permno=b.PERMNO
          and mob(a.datadate,b.date)
                between 3 and 14"


  yourData <- dbGetQuery(con,DJSQL)

我甚至很难定义函数 - 错误如下所示。

Error in sqliteSendQuery(con, statement, bind.data) : 
  error in statement: no such function: mob

1 个答案:

答案 0 :(得分:1)

您只能在SQLite中使用SQL函数(以及用C编写的函数)。你不能使用R函数。

此外,SQLite不太适合日期处理,因为它没有日期和时间类型。 SQLite提供的功能可以使用变通方法(参见最后的注释),但我建议您使用H2数据库。它内置了datediff。请注意,根据您的需要,您可能需要将最后两个参数的顺序反转为datediff

library(RH2)
library(sqldf)

# create test data frames

Lines1 <- "GVKEY,datadate,fyear,fyr,bkvlps,permno
14489,19980131,1997,1,4.0155,11081
14489,19990131,1998,1,1.8254,11081
14489,20000131,1999,1,2.0614,11081
14489,20010131,2000,1,2.1615,11081
14489,20020131,2001,1,1.804,11081"

Lines2 <- "permno,date,ret
11081,20000103,0.1
11081,20000104,0.2"

fmt <- "%Y%m%d"

Annual_File <- read.csv(text = Lines1)
Annual_File$datadate <- as.Date(as.character(Annual_File$datadate), format = fmt)

CRSP <- read.csv(text = Lines2)
CRSP$date <- as.Date(as.character(CRSP$date), format = fmt)

# run SQL statement using sqldf

sqldf("select a.*, b.ret, b.date, datediff('month', a.datadate, b.date) diff
          from Annual_File as a 
          left join CRSP as b 
          on a.permno = b.permno and 
             datediff('month', a.datadate, b.date) between 3 and 14")

,并提供:

  GVKEY   datadate fyear fyr bkvlps permno ret       date diff
1 14489 1998-01-31  1997   1 4.0155  11081  NA       <NA>   NA
2 14489 1999-01-31  1998   1 1.8254  11081 0.1 2000-01-03   12
3 14489 1999-01-31  1998   1 1.8254  11081 0.2 2000-01-04   12
4 14489 2000-01-31  1999   1 2.0614  11081  NA       <NA>   NA
5 14489 2001-01-31  2000   1 2.1615  11081  NA       <NA>   NA
6 14489 2002-01-31  2001   1 1.8040  11081  NA       <NA>   NA

注意:要使用SQLite,请使用2440588.5来转换R的UNIX纪元日期来源和SQLite函数假定的日期来源。

library(sqldf)
try(detach("package:RH2"), silent = TRUE)  # detach RH2 if present

sqldf("select a.*, b.ret, b.date
          from Annual_File as a 
          left join CRSP as b 
          on a.permno = b.permno and 
             b.date + 2440588.5 between julianday(a.datadate + 2440588.5, '+3 months') and 
                                        julianday(a.datadate + 2440588.5, '+12 months')")