在R中将数据从一个数据帧提取到另一个

时间:2020-10-01 15:16:29

标签: r finance

我有一个数据框,其中包含几年的证券交易所每日价格以及它们各自的日期。我想每个月提取一个月中的最后3个观察值和下个月的前5个观察值,并将其存储在新的数据框中。

除了日期(格式为“%Y-%m-%d”)之外,我还有一个列,每个月的交易日都有一个计数器。示例数据如下:

    df$date <- as.Date(c("2017-03-25","2017-03-26","2017-03-27","2017-03-29","2017-03-30",
                         "2017-03-31","2017-04-03","2017-04-04","2017-04-05","2017-04-06",
                         "2017-04-07","2017-04-08","2017-04-09"))

    df$DayofMonth <- c(18,19,20,21,22,23,1,2,3,4,5,6,7)
    
    df$price <- (100, 100.53, 101.3 ,100.94, 101.42, 101.40, 101.85, 102, 101.9, 102, 102.31, 102.1, 102.23)

现在我要提取3月的最后3个观测值和4月的前5个观测值(然后是4月的最后3个观测值和5月的前5个观测值,包括相应行的所有列),然后将其存储在一个新的数据框中。唯一的问题是我该怎么做?

感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

第一个想法:

date <- c("2017-03-25","2017-03-26","2017-03-27","2017-03-29","2017-03-30",
                 "2017-03-31","2017-04-03","2017-04-04","2017-04-05","2017-04-06",
                 "2017-04-07","2017-04-08","2017-04-09")

df <- data.table(Date = date)

df[,YearMonth:=str_sub(Date,1,7)]
df[, DayofMonth := seq(.N), by = YearMonth]

first <- df[, .SD[1:ifelse(.N < 5, .N, 5)], by = YearMonth] #first trading days each month
last <- df[, .SD[(ifelse((.N-2) < 0, 0, (.N-2))):.N], by = YearMonth] #last trading days each month

final <- rbind(first, last)
setorder(final, Date)

# be aware that it leads to duplicates for a month if it has less than 8 trading days, 
# to resolve that use unique()

final <- unique(final)

答案 1 :(得分:0)

快速又脏: 添加类似于DayofMonth列的列,但向下移动3列

df$dom2 <- df$DayofMonth[4:(nrow(df)+3)]
subset(df, DayofMonth<=5 | dom2<=3)

我们仍然使用实际的DayofMonth列进行过滤的唯一原因(而不是jsut说dom2 <= 8)是因为dom2的末尾将包含您的示例的NA。不知道您的真实数据是什么样子,但是比后悔更安全。