我有一个现货市场的股市图表和价格转发市场。现在我想在同一天比较它们的价格。但我发现远期市场有1826天,现货市场有1822天观察。我不知道现货市场缺少4天的日子。你们能给我一些关于如何在现货市场中找不到前进市场中的4个观察结果的语法吗?我不能删除随机4个观察结果。这些日子应该匹配。
提前感谢大家。
祝你有美好的一天!
答案 0 :(得分:1)
考虑setdiff
:类似
setdiff(forward, spot)
前方和地点各自的日期应该给你两者之间不同的日子。
答案 1 :(得分:0)
这样的东西会删除不匹配的行:
forwards <- forwards[ forwards$date %in% spot$date, ]
答案 2 :(得分:0)
我更喜欢data.table
。有关语法的介绍,请参阅Getting Started。
library(data.table)
# use setDT(forwards) on your own object to convert an existing data.frame
set.seed(20384)
all_dates = seq.Date(as.Date('2011-01-01'), as.Date('2015-12-31'), by = 'day')
TT = length(all_dates)
forward = data.table(
date = all_dates,
price_forward = rnorm(TT)
)
forward
# date price_forward
# 1: 2011-01-01 -0.0969564
# 2: 2011-01-02 -0.1079899
# 3: 2011-01-03 1.9454087
# 4: 2011-01-04 0.5079781
# 5: 2011-01-05 0.2201317
# ---
# 1822: 2015-12-27 -0.3674510
# 1823: 2015-12-28 -1.5389197
# 1824: 2015-12-29 -0.8461961
# 1825: 2015-12-30 -0.7018287
# 1826: 2015-12-31 -0.5643040
spot = data.table(
# simulate removing 4 of the dates at random
date = sample(all_dates, TT - 4L),
price_spot = rnorm(TT - 4),
# setting key is not necessary, but it will sort the data
key = 'date'
)
spot
# date price_spot
# 1: 2011-01-01 0.33803547
# 2: 2011-01-02 -1.21756020
# 3: 2011-01-03 0.13199130
# 4: 2011-01-04 -0.64201342
# 5: 2011-01-05 -0.08061704
# ---
# 1818: 2015-12-27 1.83826974
# 1819: 2015-12-28 0.22838840
# 1820: 2015-12-29 -0.93258147
# 1821: 2015-12-30 -1.20209606
# 1822: 2015-12-31 -1.80698627
识别丢失的日期很简单。 e.g。
setdiff(forward$date, spot$date)
# [1] 15212 15598 16188 16752
# strangely, setdiff strips the Date of its human-readable label
# and converted the dates to integers (read ?Date for more);
# we can recover the useful version with (note that this version
# is inefficient since we convert to character every Date object,
# when we only care about the character version of a few -- we
# would do better to convert the result of setdiff):
setdiff(paste(forward$date), paste(spot$date))
# [1] "2011-08-26" "2012-09-15" "2014-04-28" "2015-11-13"
但这可能不是你想做的全部;加入表格可能更有用:
prices = merge(forward, spot, all.x = TRUE, by = 'date')
# once merged, we can use is.na to identify the missing dates:
prices[is.na(price_spot)]
# date price_forward price_spot
# 1: 2011-08-26 0.008345504 NA
# 2: 2012-09-15 -0.966410632 NA
# 3: 2014-04-28 -1.600574836 NA
# 4: 2015-11-13 1.549928470 NA