Question

我正在处理一些投资组合数据，但对这种数据操作感到迷惑。我有这个样本数据

use_frameworks! :linkage => :static

这些是样本投资组合中的头寸。投资组合每个月都有新的权重。我要计算的是每件商品的重量变化（以前几个月的重量计）。在此示例中，我们看到在2月底，KO的当前权重为0.5，比上个月增加了0.2。 AAPL下跌0.1，而GOOG取代MSFT，因此与前一个月相比的变化是其当前总权重：0.2。我该如何设置一个突变，使其能够在前一个日期中寻找股票并计算权重之间的差？

Answer 1

如果每个“ id”的数据为每月数据，我们可以进行complete来考虑丢失的月份，然后按diff进行分组

library(dplyr)
library(tidyr)
library(zoo)    
df %>%
     mutate(yearmonth = as.Date(as.yearmon(date))) %>%
     group_by(id) %>% 
     complete(yearmonth = seq(first(yearmonth), last(yearmonth), by = '1 month')) %>%
     mutate(weight_change = if(n() == 1) weight else c(NA, diff(replace_na(weight, 0)))) %>%
     ungroup %>%
     select(names(df), weight_change) %>%
     filter(!is.na(date))
# A tibble: 9 x 5
#  date       id    weight `weight_change (desired column)` weight_change
#  <date>     <chr>  <dbl>                            <dbl>         <dbl>
#1 2020-01-31 AAPL     0.4                             NA          NA    
#2 2020-02-29 AAPL     0.3                             -0.1        -0.1  
#3 2020-03-31 AAPL     0.2                             -0.1        -0.100
#4 2020-02-29 GOOG     0.2                              0.2         0.2  
#5 2020-01-31 KO       0.3                             NA          NA    
#6 2020-02-29 KO       0.5                              0.2         0.2  
#7 2020-03-31 KO       0.6                              0.1         0.100
#8 2020-01-31 MSFT     0.3                             NA          NA    
#9 2020-03-31 MSFT     0.2                              0.2         0.2

Answer 2

这是我不太紧凑的解决方案。我只使用一些帮助器列，我留在其中以便可以跟随。

library(tidyverse)
library(lubridate)

df <- tibble(
  date = c("2020-01-31", "2020-01-31", "2020-01-31", 
                   "2020-02-29", "2020-02-29", "2020-02-29",
                   "2020-03-31", "2020-03-31", "2020-03-31"),
  id = c("KO", "AAPL", "MSFT", "KO", "AAPL", "GOOG", "KO", "AAPL", "MSFT"),
  weight = c(0.3, 0.4, 0.3, 0.5, 0.3, 0.2, 0.6, 0.2, 0.2),
  `weight_change (desired_column)` = c(NA, NA, NA, 0.2, -0.1, 0.2, 0.1, -0.1, 0.2)
) %>% #new code starts here
  mutate(
    date = as_date(date),
    date_ym = floor_date(date,
                         unit = "month"))%>%
  group_by(id)%>%
  arrange(date)%>%
  mutate(id_n = row_number(),
         prev_exist = case_when(lag(date_ym) == date_ym - months(1) ~ "immediate month", #if there is an immediate month
                                id_n == 1 & date != min(df$date)~ "new month", #if this is a new month
                                TRUE ~ "no immediate month"),
         weight_change = case_when(prev_exist == "new month"~ weight,
                                   prev_exist == "no immediate month" & id_n > 1~ weight,
                                   TRUE ~ weight-lag(weight)),
         date_ym = NULL,
         id_n  = NULL,
         prev_exist = NULL)

Answer 3

一种timetk方法：

library(timetk)
df %>% 
   mutate(Month = lubridate::floor_date(date, "month")) %>%
   group_by(id) %>% 
   timetk::pad_by_time(.date_var = Month, .by="month") %>% 
   select(-Month) %>% 
   mutate(WC = if(n() == 1) weight else c(NA, diff(weight)))

A tibble: 10 x 5
Groups:   id [4]
   id    date       weight weight_change     WC
   <chr> <date>      <dbl>         <dbl>  <dbl>
 1 KO    2020-01-31    0.3          NA   NA    
 2 KO    2020-02-29    0.5           0.2  0.2  
 3 KO    2020-03-31    0.6           0.1  0.100
 4 AAPL  2020-01-31    0.4          NA   NA    
 5 AAPL  2020-02-29    0.3          -0.1 -0.1  
 6 AAPL  2020-03-31    0.2          -0.1 -0.100
 7 MSFT  2020-01-31    0.3          NA   NA    
 8 MSFT  NA           NA            NA   NA    
 9 MSFT  2020-03-31    0.2           0.2 NA    
10 GOOG  2020-02-29    0.2           0.2  0.2

计算月度权重差异

3 个答案: