基本R功能,用于加载和清理库存数据

时间:2017-04-12 03:34:41

标签: r

我正在尝试编写一个函数来为我使用“getSymbols”加载的所有股票代码执行以下任务。我尝试过使用lapply但功能似乎不起作用。

library(quantmod)
getSymbols(c("XLF","VFH","XLI","VIS","RWO","IYR","VNQI","VGT","RYT","VPU","IDU"), src = "yahoo",from="2012-01-01" ) 

#NEED TO FIGURE OUT A FUNCTION FOR THIS
XLF = as.data.frame(XLF)
XLF$date = row.names(XLF)
XLI[,c("XLI.Open","XLI.High", "XLI.Low", "XLI.Adjusted")] = NULL
XLI["ticker"]="XLI"
XLI["industry"]="industrials"
colnames(XLI) <- c("date","close","volume","ticker","industry")

1 个答案:

答案 0 :(得分:1)

虽然您在输出中提到了收盘价,但建议使用 调整后的价格列,因为它是针对公司行为进行调整的 股票分割,股息等。

我使用了测试行业向量,您需要用实际值替换它们。

您可以按如下方式使用new.envlapply

library(quantmod)


tickerVec = c("XLF","VFH","XLI","VIS","RWO","IYR","VNQI","VGT","RYT","VPU","IDU")

#test industry vector, replace with actual sector names
industryVec = c("industrials","financials","materials","energy",
            "materials","energy","financials","technology","industrials","technology","energy")


startDt = as.Date("2012-01-01")

#create new data environment for storing all price timeseries

data.env = new.env()

getSymbols(tickerVec,env=data.env,src = "yahoo",from=startDt )      


#convert to list class for ease in manipulation

data.env.lst = as.list(data.env)

#create an anoynmous function to reshape timeseries into required shape

fn_modifyData = function(x) {

TS = data.env.lst[[x]]

#xts to data.frame
TS_DF = data.frame(date=as.Date(index(TS)),coredata(TS),stringsAsFactors=FALSE)

#retain only required columns
TS_DF = TS_DF[,c(1,5,6)]

TS_DF$ticker = tickerVec[x]
TS_DF$industry = industryVec[x]
colnames(TS_DF)  = c("date","close","volume","ticker","industry")
row.names(TS_DF) = NULL

return(TS_DF)

}

<强>输出:

#apply function to all timeseries using lapply
outList = lapply(1:length(data.env.lst),function(z) fn_modifyData(z) )


head(outList[[1]])
#        date close    volume ticker    industry
#1 2012-01-03 13.34 103362000    XLF industrials
#2 2012-01-04 13.30  69833900    XLF industrials
#3 2012-01-05 13.48  89935300    XLF industrials
#4 2012-01-06 13.40  83878600    XLF industrials
#5 2012-01-09 13.47  69189600    XLF industrials
#6 2012-01-10 13.71  86035100    XLF industrials
head(outList[[11]])
#        date close volume ticker industry
#1 2012-01-03 50.55   6100    IDU   energy
#2 2012-01-04 50.41   2700    IDU   energy
#3 2012-01-05 50.83   1700    IDU   energy
#4 2012-01-06 50.82   7700    IDU   energy
#5 2012-01-09 51.25   1800    IDU   energy
#6 2012-01-10 51.71   5500    IDU   energy


#if you wish to combine all datasets 
outDF = do.call(rbind,outList)

head(outDF)
#        date close    volume ticker    industry
#1 2012-01-03 13.34 103362000    XLF industrials
#2 2012-01-04 13.30  69833900    XLF industrials
#3 2012-01-05 13.48  89935300    XLF industrials
#4 2012-01-06 13.40  83878600    XLF industrials
#5 2012-01-09 13.47  69189600    XLF industrials
#6 2012-01-10 13.71  86035100    XLF industrials