如何合并来自两个不同集的股票数据?

时间:2014-01-17 03:14:54

标签: r csv merge xts cbind

我有两个数据集AAPLAMZN,我希望两个合并,但发现很难这样做,因为merge cbind无法按照我的意愿去做它是。我认为问题是将数据集识别为data.frames但不确定。

数据如下所示:

      Date Time   Open   High    Low  Close  Volume
1 12/14/12 9:30 514.75 515.10 512.72 512.86 2504264
2 12/14/12 9:31 512.80 513.00 510.00 510.17  574498
3 12/14/12 9:32 510.04 511.70 509.11 511.26  673126
4 12/14/12 9:33 511.26 511.54 508.82 509.25  477914
5 12/14/12 9:34 509.03 510.65 508.50 510.54  432689

期望的结果:

    Date Time   Open   High    Low  Close Volume
12/14/12 9:30 250.11 250.64 250.07 250.37  38249
12/14/12 9:31 250.60 250.60 250.16 250.51   6954
12/14/12 9:32 250.47 250.72 250.43 250.72   3843
12/14/12 9:33 250.69 250.70 250.44 250.50   3990
12/14/12 9:34 250.46 250.64 250.21 250.31   4490

    Date Time   Open   High    Low  Close Volume
12/14/12 9:31 512.80 513.00 510.00 510.17 574498
12/14/12 9:32 510.04 511.70 509.11 511.26 673126
12/14/12 9:33 511.26 511.54 508.82 509.25 477914
12/14/12 9:34 509.03 510.65 508.50 510.54 432689

基本上,我想合并DateTime 并排这两个数据集(我不能在这里做)。我尝试将每个数据集转换为xts,但不确定它是否正确:

AAPL <- read.csv("aapl1.csv",header=TRUE)
AMZN <- read.csv("amzn1.csv",header=TRUE)
aapl <- xts(AAPL[,c(3:7)], AAPL$DATETIME <-as.POSIXct(paste(AAPL$Date,AAPL$Time), format=""%m/%d/%Y %H:%M"))
amzn <- xts(AMZN[,c(3:7)], AMZN$DATETIME <-as.POSIXct(paste(AMZN$Date,AMZN$Time), format=""%m/%d/%Y %H:%M"))

当我使用cbindmerge甚至join时,它无法合并。

3 个答案:

答案 0 :(得分:2)

如果您的xts个对象被日期时间索引(应该是这样),只需将两个集合传递给合并即可。在这里,我将合并一个集合,因为你的问题缺乏示例数据:

data(sample_matrix)
sample.xts <- as.xts(head(sample_matrix), descr='my new xts object') # From ?xts

 merge(sample.xts, sample.xts)
##                Open     High      Low    Close   Open.1   High.1    Low.1  Close.1
## 2007-01-02 50.03978 50.11778 49.95041 50.11778 50.03978 50.11778 49.95041 50.11778
## 2007-01-03 50.23050 50.42188 50.23050 50.39767 50.23050 50.42188 50.23050 50.39767
## 2007-01-04 50.42096 50.42096 50.26414 50.33236 50.42096 50.42096 50.26414 50.33236
## 2007-01-05 50.37347 50.37347 50.22103 50.33459 50.37347 50.37347 50.22103 50.33459
## 2007-01-06 50.24433 50.24433 50.11121 50.18112 50.24433 50.24433 50.11121 50.18112
## 2007-01-07 50.13211 50.21561 49.99185 49.99185 50.13211 50.21561 49.99185 49.99185

这是有效的,因为merge会为这些数据调用merge.xts

以下是您的示例数据的合并,而不使用xts。首先,让我们将它们读入解释器:

AAPL <- read.table(header=T, text='Date Time Open High Low Close Volume
12/14/12 9:30 250.11 250.64 250.07 250.37 38249
12/14/12 9:31 250.60 250.60 250.16 250.51 6954
12/14/12 9:32 250.47 250.72 250.43 250.72 3843
12/14/12 9:33 250.69 250.70 250.44 250.50 3990
12/14/12 9:34 250.46 250.64 250.21 250.31 4490')

AMZN <- read.table(header=T, text='Date Time Open High Low Close Volume
12/14/12 9:31 512.80 513.00 510.00 510.17 574498
12/14/12 9:32 510.04 511.70 509.11 511.26 673126
12/14/12 9:33 511.26 511.54 508.82 509.25 477914
12/14/12 9:34 509.03 510.65 508.50 510.54 432689')

现在这些是data.frame类的对象,可以在DateTime列上合并:

merge(AAPL, AMZN, by=c('Date', 'Time'), all=T, suffixes = c('.AAPL', '.AMZN'))
##       Date Time Open.AAPL High.AAPL Low.AAPL Close.AAPL Volume.AAPL Open.AMZN High.AMZN Low.AMZN Close.AMZN Volume.AMZN
## 1 12/14/12 9:30    250.11    250.64   250.07     250.37       38249        NA        NA       NA         NA          NA
## 2 12/14/12 9:31    250.60    250.60   250.16     250.51        6954    512.80    513.00   510.00     510.17      574498
## 3 12/14/12 9:32    250.47    250.72   250.43     250.72        3843    510.04    511.70   509.11     511.26      673126
## 4 12/14/12 9:33    250.69    250.70   250.44     250.50        3990    511.26    511.54   508.82     509.25      477914
## 5 12/14/12 9:34    250.46    250.64   250.21     250.31        4490    509.03    510.65   508.50     510.54      432689

答案 1 :(得分:1)

第二种选择是来自join()包的plyr。它比merge()有一些优点,但也提供了更少的选项。对于非常大的数据集,建议使用它,因为它比merge()快。

require(plyr)
join(AAPL, AMZN, by = c("Date", "Time"))

答案 2 :(得分:1)

一旦解决了代码中的一些问题,转换为xts并使用merge就行了。

AAPL <- read.csv("aapl1.csv",header=TRUE)
AMZN <- read.csv("amzn1.csv",header=TRUE)
# your code is easier to understand if you create these columns outside of the
# xts constructor. Note that your `format` was incorrect. You need %y
# (2-digit year), not %Y (4-digit year). You also had unmatched quotes.
AAPL$DATETIME <- as.POSIXct(paste(AAPL$Date,AAPL$Time), format="%m/%d/%y %H:%M")
AMZN$DATETIME <- as.POSIXct(paste(AMZN$Date,AMZN$Time), format="%m/%d/%y %H:%M")
# create xts objects and merge
aapl <- xts(AAPL[,c(3:7)], AAPL$DATETIME)
amzn <- xts(AMZN[,c(3:7)], AMZN$DATETIME)
aapl.amzn <- merge(aapl,amzn)