学习R,不知道如何解决这个问题。
library(quantmod)
library(xts)
# get market data
Nasdaq100_Symbols <- c("AAPL", "AAL")
getSymbols(Nasdaq100_Symbols)
# merge them together
nasdaq100 <- data.frame(as.xts(merge(AAPL, AAL)))
#tail(nasdaq100[,1:12],2)
#make percent difference column
nasdaq100$PD <- (((nasdaq100$AAPL.High - nasdaq100$AAPL.Open)/nasdaq100$AAPL.Open) * 100)
我尝试添加百分比差异列,但上述代码仅适用于AAPL符号(或您使用的任何符号),而不是为每个符号创建PD列。
在与xts合并之前,你是否必须以某种方式添加该列,或者我可以告诉R为新合并帧中的每个符号创建它吗?
编辑:我正在进行数据培训,所以我需要所有符号都是标题,例如:
AAPL.Ope AAPL.High AAPL.Volume AAL.Open AAL.High
1/3/2007 86.29 86.58 309579900 53.89 56.92
1/4/2007 84.05 85.95 211815100 56.3 59.15
1/5/2007 85.77 86.2 208685400 58.83 59.15
答案 0 :(得分:6)
根据我的经验,将财务数据保留为xts
个对象,以便将来使用其他技术指标等进行操作通常更有意义,除非您计划在caret
中运行预测模型在这种情况下,转换为data.frame
可能有意义。
考虑将数据符号保留为容器的元素,例如
update_sym_md <- function(sym, env = .GlobalEnv) {
x <- get(sym, env)
pd <- setNames((Hi(x) - Op(x)) / Op(x), "PD")
merge(x, pd)
}
# Adjust env for location of xts symbol data
l.syms <- lapply(Nasdaq100_Symbols, update_sym_md, env = .GlobalEnv)
lapply(l.syms, head)
# [[1]]
# AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted PD
# 2007-01-03 86.29 86.58 81.90 83.80 309579900 10.85709 0.003360760
# 2007-01-04 84.05 85.95 83.82 85.66 211815100 11.09807 0.022605556
# 2007-01-05 85.77 86.20 84.40 85.05 208685400 11.01904 0.005013373
# 2007-01-08 85.96 86.53 85.28 85.47 199276700 11.07345 0.006630991
# 2007-01-09 86.45 92.98 85.15 92.57 837324600 11.99333 0.075534942
# 2007-01-10 94.75 97.80 93.45 97.00 738220000 12.56728 0.032190006
#
# [[2]]
# AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted PD
# 2007-01-03 53.89 56.92 53.89 56.30 2955600 54.80361 0.0562256273
# 2007-01-04 56.30 59.15 53.65 58.84 2614500 57.27610 0.0506217238
# 2007-01-05 58.83 59.15 57.90 58.29 1656300 56.74072 0.0054394015
# 2007-01-08 57.30 60.48 57.04 57.93 2163200 56.39028 0.0554974006
# 2007-01-09 59.44 60.20 57.56 57.90 2098600 56.36108 0.0127860366
# 2007-01-10 60.03 60.04 57.34 58.93 3892200 57.36371 0.0001666167
此外,如果您确实希望比较一个xts对象中的符号的价格回报/原始价格,而不是比较data.frame,您可能会发现qmao
包有用。
例如:
install.packages("qmao", repos="http://R-Forge.R-project.org", type = "source")
library(qmao)
pf <- makePriceFrame(Nasdaq100_Symbols)
head(pf, 3)
# AAPL AAL
# 2007-01-03 10.85709 54.80361
# 2007-01-04 11.09807 57.27610
# 2007-01-05 11.01904 56.74072
rf <- makeReturnFrame(Nasdaq100_Symbols)
head(rf)
# AAPL AAL
# 2007-01-03 NA NA
# 2007-01-04 0.021952895 0.0441273684
# 2007-01-05 -0.007146715 -0.0093913155
# 2007-01-08 0.004926208 -0.0061951917
# 2007-01-09 0.079799692 -0.0005179716
# 2007-01-10 0.046745798 0.0176329011
更新以回应OP的评论:
要将所有数据合并为一行,请尝试以下操作:
(旁白:如果您要在此数据框架上使用非线性预测模型,请务必考虑先在每行中的证券中扩展您的数据点。)
x.cbind <- do.call(cbind, l.syms)
head(x.cbind)
# AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted PD AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted PD.1
# 2007-01-03 86.29 86.58 81.90 83.80 309579900 10.85709 0.003360760 53.89 56.92 53.89 56.30 2955600 54.80361 0.0562256273
# 2007-01-04 84.05 85.95 83.82 85.66 211815100 11.09807 0.022605556 56.30 59.15 53.65 58.84 2614500 57.27610 0.0506217238
# 2007-01-05 85.77 86.20 84.40 85.05 208685400 11.01904 0.005013373 58.83 59.15 57.90 58.29 1656300 56.74072 0.0054394015
# 2007-01-08 85.96 86.53 85.28 85.47 199276700 11.07345 0.006630991 57.30 60.48 57.04 57.93 2163200 56.39028 0.0554974006
# 2007-01-09 86.45 92.98 85.15 92.57 837324600 11.99333 0.075534942 59.44 60.20 57.56 57.90 2098600 56.36108 0.0127860366
# 2007-01-10 94.75 97.80 93.45 97.00 738220000 12.56728 0.032190006 60.03 60.04 57.34 58.93 3892200 57.36371 0.0001666167
df.cbind <- data.frame("time" = index(x.cbind), coredata(x.cbind))
head(df.cbind)
# time AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted PD AAL.Open AAL.High AAL.Low AAL.Close AAL.Volume AAL.Adjusted PD.1
# 1 2007-01-03 86.29 86.58 81.90 83.80 309579900 10.85709 0.003360760 53.89 56.92 53.89 56.30 2955600 54.80361 0.0562256273
# 2 2007-01-04 84.05 85.95 83.82 85.66 211815100 11.09807 0.022605556 56.30 59.15 53.65 58.84 2614500 57.27610 0.0506217238
# 3 2007-01-05 85.77 86.20 84.40 85.05 208685400 11.01904 0.005013373 58.83 59.15 57.90 58.29 1656300 56.74072 0.0054394015
# 4 2007-01-08 85.96 86.53 85.28 85.47 199276700 11.07345 0.006630991 57.30 60.48 57.04 57.93 2163200 56.39028 0.0554974006
# 5 2007-01-09 86.45 92.98 85.15 92.57 837324600 11.99333 0.075534942 59.44 60.20 57.56 57.90 2098600 56.36108 0.0127860366
# 6 2007-01-10 94.75 97.80 93.45 97.00 738220000 12.56728 0.032190006 60.03 60.04 57.34 58.93 3892200 57.36371 0.0001666167
为了更好地理解qmao
函数的工作原理,为什么不查看文档中的示例并从那里开始呢?
?makeReturnFrame
查看源代码,真正了解正在发生的事情(通过学习良好的编码风格,同时成为更好的R程序员)
答案 1 :(得分:2)
我想使用相当新的有趣的包tidyquant
添加一个解决方案,这对于像这样的任务非常有用。你可以使用来自tidyverse的所有东西,它可以利用xts,quantmod和TTR的定量功能!通过许多示例查看vignette。
library(tidyquant)
c("AAPL", "AAL") %>%
tq_get(get = "stock.prices") %>%
group_by(symbol.x) %>%
tq_mutate(ohlc_fun = OHLCV, mutate_fun = OpCl,col_rename='diff') %>%
select(- c(low,volume)) # I deselect low and volume to show the added colum ‘diff'
Source: local data frame [5,110 x 7]
Groups: symbol.x [2]
symbol.x date open high close adjusted diff
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2007-01-03 86.29 86.58 83.80 10.85709 0.0033607603
2 AAPL 2007-01-04 84.05 85.95 85.66 11.09807 0.0226055559
3 AAPL 2007-01-05 85.77 86.20 85.05 11.01904 0.0050133730
4 AAPL 2007-01-08 85.96 86.53 85.47 11.07345 0.0066309913
5 AAPL 2007-01-09 86.45 92.98 92.57 11.99333 0.0755349424
更新:要求问题在列名中添加符号
假设您将上述数据框保存为变量stocks
:
lapply(unique(stocks$symbol.x), function(x) stocks[stocks$symbol.x == x,]) %>%
lapply( function(x) {
names(x) <- paste0(x$symbol.x[1],'.',colnames(x))
x
})
[[1]]
Source: local data frame [2,555 x 7]
Groups: symbol.x [1]
AAPL.symbol.x AAPL.date AAPL.open AAPL.high AAPL.close
<chr> <date> <dbl> <dbl> <dbl>
1 AAPL 2007-01-03 86.29 86.58 83.80
2 AAPL 2007-01-04 84.05 85.95 85.66
3 AAPL 2007-01-05 85.77 86.20 85.05
4 AAPL 2007-01-08 85.96 86.53 85.47
5 AAPL 2007-01-09 86.45 92.98 92.57
6 AAPL 2007-01-10 94.75 97.80 97.00
7 AAPL 2007-01-11 95.94 96.78 95.80
8 AAPL 2007-01-12 94.59 95.06 94.62
9 AAPL 2007-01-16 95.68 97.25 97.10
10 AAPL 2007-01-17 97.56 97.60 94.95
# ... with 2,545 more rows, and 2 more variables:
# AAPL.adjusted <dbl>, AAPL.diff <dbl>
[[2]]
Source: local data frame [2,555 x 7]
Groups: symbol.x [1]
AAL.symbol.x AAL.date AAL.open AAL.high AAL.close
<chr> <date> <dbl> <dbl> <dbl>
1 AAL 2007-01-03 53.89 56.92 56.30
2 AAL 2007-01-04 56.30 59.15 58.84
3 AAL 2007-01-05 58.83 59.15 58.29
4 AAL 2007-01-08 57.30 60.48 57.93
5 AAL 2007-01-09 59.44 60.20 57.90
6 AAL 2007-01-10 60.03 60.04 58.93
7 AAL 2007-01-11 59.18 61.20 61.20
8 AAL 2007-01-12 61.20 62.50 60.81
9 AAL 2007-01-16 60.81 62.10 61.96
10 AAL 2007-01-17 60.96 61.89 58.65
# ... with 2,545 more rows, and 2 more variables:
# AAL.adjusted <dbl>, AAL.diff <dbl>
答案 2 :(得分:1)
当我使用quantmod
getSymbols
函数时,我通常会做的是编写自己的小包装函数,就像这样:
library(quantmod)
# 1. write the wrapper function
my_wrapper <- function(tickers, from, to) {
result_list <- lapply(tickers, function(ticker) {
tmp <- getSymbols(ticker, from = from, to = to, auto.assign = F)
tmp_df <- data.frame(date = index(tmp),
ticker = ticker,
open = as.numeric(Op(tmp)),
high = as.numeric(Hi(tmp)),
close = as.numeric(Cl(tmp)),
adj = as.numeric(Ad(tmp)))
})
result_df <- do.call(rbind, result_list)
return(result_df)
}
# 2. download and inspect the data
my_df <- my_wrapper(c("AAPL", "AAL"), from = "2010-01-01", to = "2016-12-31")
summary(my_df)
# date ticker open high close adj
# Min. :2010-01-04 AAPL:1762 Min. : 3.99 Min. : 4.06 Min. : 4.00 Min. : 3.894
# 1st Qu.:2011-09-30 AAL :1762 1st Qu.: 17.16 1st Qu.: 17.46 1st Qu.: 17.23 1st Qu.: 16.770
# Median :2013-07-04 Median : 72.94 Median : 73.45 Median : 73.02 Median : 41.920
# Mean :2013-07-02 Mean :168.48 Mean :170.10 Mean :168.40 Mean : 49.208
# 3rd Qu.:2015-04-06 3rd Qu.:318.11 3rd Qu.:320.39 3rd Qu.:318.23 3rd Qu.: 72.969
# Max. :2016-12-30 Max. :702.41 Max. :705.07 Max. :702.10 Max. :127.966
然后,为了计算差异,我建议使用dplyr
或data.table
或其他一些数据框架操作包。我在这里使用dplyr。
# 3. Calculate the difference using dplyr
library(dplyr)
my_rets <- my_df %>% group_by(ticker) %>% mutate(pd = (high - open) / open)
my_rets
# Source: local data frame [3,524 x 7]
# Groups: ticker [2]
#
# date ticker open high close adj pd
# <date> <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2010-01-04 AAPL 213.43 214.50 214.01 27.72704 0.0050133440
# 2 2010-01-05 AAPL 214.60 215.59 214.38 27.77498 0.0046132153
# 3 2010-01-06 AAPL 214.38 215.23 210.97 27.33318 0.0039649549
# 4 2010-01-07 AAPL 211.75 212.00 210.58 27.28265 0.0011806659
# 5 2010-01-08 AAPL 210.30 212.00 211.98 27.46403 0.0080837473
# 6 2010-01-11 AAPL 212.80 213.00 210.11 27.22176 0.0009398731
# 7 2010-01-12 AAPL 209.19 209.77 207.72 26.91211 0.0027725991
# 8 2010-01-13 AAPL 207.87 210.93 210.65 27.29172 0.0147206905
# 9 2010-01-14 AAPL 210.11 210.46 209.43 27.13366 0.0016657655
# 10 2010-01-15 AAPL 210.93 211.60 205.93 26.68020 0.0031764188
# # ... with 3,514 more rows
P.S。你在这里找到了一个很好的dplyr介绍:https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html