合并两个数据帧r进行回测

时间:2017-12-17 16:43:45

标签: r dataframe merge

我想合并两个包含单个股票时间序列的数据框,因此每列代表股票的信息。因此,Dataframe 1具有股票价格,而Dataframe 2具有P / E比率。我的目标是准备一个数据帧,我可以使用包backtest,它需要这种格式的数据帧:

library('backtest')
data(starmine)

其结构如下:

date PRICE  symbol
date1 4.2    AAPL
date1 6.3    MSFT
date1 2.2    GE
date2 4.1    AAPL
date2 6.3    MSFT
date2 2.5    GE

因此数据集按月分组。我的数据包含多个数据框,每个数据框包含所有股票和所有日期的利息变量(例如价格,市盈率等)。一个例子:

dates <- seq(as.Date("1995/1/1"), by = "month", length.out = 10)

a = sample(0:1,10,rep=TRUE) 
b = sample(0:1,10,rep=TRUE)
c = sample(0:1,10,rep=TRUE)
prices = data.frame(dates,a,b,c)       

a = sample(0:1,10,rep=TRUE) 
b = sample(0:1,10,rep=TRUE)
c = sample(0:1,10,rep=TRUE)
pe = data.frame(dates,a,b,c)       

任何人都可以如何合并df1和df2以获得与starmine相同的结构?我想到了这样的事情:

> total <- merge(df1,df2,by=colnames)
Error in as.vector(x, mode) : 
cannot coerce type 'closure' to vector of type 'any'

这是我想要获得的结构:

date     price  pe  symbol  
1995/1/1 4.2    0.5     a
1995/1/1 6.3    0.4     b
1995/1/1 2.2    0.3     c
1995/2/1 4.1    0.4     a
1995/2/1 6.3    0.2     b
1995/2/1 2.5    0.1     c
1995/3/1 4.2    0.5     a
1995/3/1 6.3    0.4     b
1995/3/1 2.2    0.3     c
1995/4/1 4.1    0.4     a
1995/4/1 6.3    0.2     b
1995/4/1 2.5    0.1     c

1 个答案:

答案 0 :(得分:1)

# example data
dates <- seq(as.Date("1995/1/1"), by = "month", length.out = 10)

a = sample(0:1,10,rep=TRUE) 
b = sample(0:1,10,rep=TRUE)
c = sample(0:1,10,rep=TRUE)
prices = data.frame(dates,a,b,c)       

a = sample(0:1,10,rep=TRUE) 
b = sample(0:1,10,rep=TRUE)
c = sample(0:1,10,rep=TRUE)
pe = data.frame(dates,a,b,c)     

library(dplyr)
library(tidyr)

# add dataset name as a column
prices$name = "price"
pe$name = "pe"

tbl_df(rbind(prices, pe)) %>%
  gather(symbol, value, -dates, -name) %>%   
  spread(name, value)

# # A tibble: 30 x 4
#        dates symbol    pe price
# *     <date>  <chr> <int> <int>
# 1 1995-01-01      a     1     0
# 2 1995-01-01      b     0     1
# 3 1995-01-01      c     0     0
# 4 1995-02-01      a     0     0
# 5 1995-02-01      b     0     1
# 6 1995-02-01      c     0     1
# 7 1995-03-01      a     0     0
# 8 1995-03-01      b     1     0
# 9 1995-03-01      c     0     0
# 10 1995-04-01     a     0     1
# # ... with 20 more rows

我仅将tbl_df(rbind(prices, pe))用于可视化目的。您并不真正需要tbl_df(),因此您可以使用rbind(prices, pe)代替。