如何使用data.table读取CBOE csv文件

时间:2014-08-16 11:06:54

标签: r data.table

我正在尝试使用

阅读http://www.cboe.com/publish/ScheduledTask/MktData/datahouse/pcratioarchive.csv(有关此处数据的更多信息http://www.cboe.com/data/PutCallRatio.aspx
library(data.table)
download.file(url="http://www.cboe.com/publish/ScheduledTask/MktData/datahouse/pcratioarchive.csv", destfile="pcratioarchive.csv")
outDT <- fread("pcratioarchive.csv", header=FALSE, skip=4)

不知何故,这会检测到奇怪的错误(我在pcratioarchive.csv文件本身中看不到):

  

outDT&lt; - fread(“pcratioarchive.csv”,header = FALSE,skip = 4)   fread错误(“pcratioarchive.csv”,header = FALSE,skip = 4):     当检测到类型时,预期的sep(',')但新行,EOF(或其他非打印字符)在第6行结束字段2:12/2 / 1999,0.52

有没有办法在没有手动更改pcratioarchive.csv的情况下使用data.table工作?

我的会话信息:

  

sessionInfo()       R版本3.1.1(2014-07-10)       平台:x86_64-pc-linux-gnu(64位)

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.3 rj_1.1.3-1      

loaded via a namespace (and not attached):
[1] plyr_1.8.1    Rcpp_0.11.1   reshape2_1.4  rj.gd_1.1.3-1 stringr_0.6.2
[6] tools_3.1.1  

1 个答案:

答案 0 :(得分:3)

pcratioarchive.csv文件格式错误。例如:

...
10/12/1995,0.63,,,
10/13/1995,0.76,,,
10/16/1995,0.87
10/17/1995,0.76
...
10/17/2003,0.64,,
10/20/2003,0.62,,
10/21/2003,0.7,1.27,0.59
10/22/2003,0.98,1.89,0.77
...

我对fread不熟悉,知道它是否有处理此问题的参数,但是read.csv会这样做。

x <- read.csv("pcratioarchive.csv", header=FALSE, skip=4, fill=TRUE)