我正在尝试在 R 中构建一个投资组合,我需要将不同的股票 (PERMNO) 划分为六个不同的投资组合。
我想创建一个逻辑,其中股票被归类为具有 mkt.cap > 给定年份(例如 2010)所有股票的 mkt.cap 中值
另外,在上述两组中,股票应根据BM(OBS)分为3组。
分类应该是这样的:
Mkt. Cap
Quartile BM (OBS) Over yearly median Under yearly median
>70% PF1 PF2
30-70% PF3 PF4
<30% PF5 PF6
我的数据表中的一个示例如下所示:
PERMNO Date ret mkt.cap BM (OBS)
10001 2009-12 0,1626 44918,3008 0,00000000000000000000
75672 2009-12 -0,2062 43722,1389 0,00001104509093018260
80928 2009-12 0,1770 689062,2694 0,00000688713518454942
80912 2009-12 -0,0274 71494,3516 0,00000984511341873784
76261 2009-12 0,0315 382438,0821 0,00000213437164919912
90303 2009-12 0,1959 964578,8864 0,00000000000000000000
91161 2009-12 0,2808 371170,0671 0,00000504687787573149
89841 2009-12 0,0438 1235170,0000 0,00000000000000000000
82515 2009-12 0,0565 934767,3563 0,00002803828655806010
84330 2009-12 -0,1000 166769,8187 0,00014664615387307400
10001 2010-01 -0,0189 43871,6618 0,00000000000000000000
75672 2010-01 -0,0260 42586,5000 0,00001115063263397240
80928 2010-01 -0,0704 640548,3269 0,00000728527479914769
80912 2010-01 0,0256 73322,8542 0,00000943960571401137
76261 2010-01 -0,0334 369662,6679 0,00000217133254998311
90303 2010-01 -0,1095 858998,8864 0,00000000000000000000
91161 2010-01 -0,1217 325990,6705 0,00000565055792544003
89841 2010-01 -0,0480 1175881,8965 0,00000000000000000000
82515 2010-01 -0,0377 899493,1499 0,00002865219568686880
84330 2010-01 0,0873 181329,0906 0,00013295614165661100
我的数据集非常广泛,因此代码应该能够在大型数据集上快速运行。
我正在考虑为投资组合创建 6 个新的二元变量,根据股票是否符合他们的不同标准,它们将是 = 0 或 = 1,但我不知道如何做到这一点
谢谢
答案 0 :(得分:0)
如果您希望使用年度聚合/分位数计算新列,请使用此代码
df$YEAR <- substr(df$Date, 1, 4)
df$PF1 <- as.numeric(ave(df$BM_OBS, df$YEAR, FUN = function(x){x >= quantile(x, 0.7)}) & ave(df$mkt.cap, df$YEAR, FUN = function(x){x >= median(x)}))
df$PF2 <- as.numeric(ave(df$BM_OBS, df$YEAR, FUN = function(x){x >= quantile(x, 0.7)}) & ave(df$mkt.cap, df$YEAR, FUN = function(x){x < median(x)}))
df$PF3 <- as.numeric(ave(df$BM_OBS, df$YEAR, FUN = function(x){x < quantile(x, 0.7) & x >= quantile(x, 0.3)}) & ave(df$mkt.cap, df$YEAR, FUN = function(x){x >= median(x)}))
df$PF4 <- as.numeric(ave(df$BM_OBS, df$YEAR, FUN = function(x){x < quantile(x, 0.7) & x >= quantile(x, 0.3)}) & ave(df$mkt.cap, df$YEAR, FUN = function(x){x < median(x)}))
df$PF5 <- as.numeric(ave(df$BM_OBS, df$YEAR, FUN = function(x){x < quantile(x, 0.3)}) & ave(df$mkt.cap, df$YEAR, FUN = function(x){x >= median(x)}))
df$PF6 <- as.numeric(ave(df$BM_OBS, df$YEAR, FUN = function(x){x < quantile(x, 0.3)}) & ave(df$mkt.cap, df$YEAR, FUN = function(x){x < median(x)}))
获得
> df
PERMNO Date ret mkt.cap BM_OBS YEAR PF1 PF2 PF3 PF4 PF5 PF6
1 10001 2009-12 0.1626 44918.30 0.000000e+00 2009 0 0 0 0 0 1
2 75672 2009-12 -0.2062 43722.14 1.104509e-05 2009 0 1 0 0 0 0
3 80928 2009-12 0.1770 689062.27 6.887135e-06 2009 0 0 1 0 0 0
4 80912 2009-12 -0.0274 71494.35 9.845113e-06 2009 0 0 0 1 0 0
5 76261 2009-12 0.0315 382438.08 2.134372e-06 2009 0 0 1 0 0 0
6 90303 2009-12 0.1959 964578.89 0.000000e+00 2009 0 0 0 0 1 0
7 91161 2009-12 0.2808 371170.07 5.046878e-06 2009 0 0 0 1 0 0
8 89841 2009-12 0.0438 1235170.00 0.000000e+00 2009 0 0 0 0 1 0
9 82515 2009-12 0.0565 934767.36 2.803829e-05 2009 1 0 0 0 0 0
10 84330 2009-12 -0.1000 166769.82 1.466462e-04 2009 0 1 0 0 0 0
11 10001 2010-01 -0.0189 43871.66 0.000000e+00 2010 0 0 0 0 0 1
12 75672 2010-01 -0.0260 42586.50 1.115063e-05 2010 0 1 0 0 0 0
13 80928 2010-01 -0.0704 640548.33 7.285275e-06 2010 0 0 1 0 0 0
14 80912 2010-01 0.0256 73322.85 9.439606e-06 2010 0 0 0 1 0 0
15 76261 2010-01 -0.0334 369662.67 2.171333e-06 2010 0 0 1 0 0 0
16 90303 2010-01 -0.1095 858998.89 0.000000e+00 2010 0 0 0 0 1 0
17 91161 2010-01 -0.1217 325990.67 5.650558e-06 2010 0 0 0 1 0 0
18 89841 2010-01 -0.0480 1175881.90 0.000000e+00 2010 0 0 0 0 1 0
19 82515 2010-01 -0.0377 899493.15 2.865220e-05 2010 1 0 0 0 0 0
20 84330 2010-01 0.0873 181329.09 1.329561e-04 2010 0 1 0 0 0 0
使用的数据
df <- structure(list(PERMNO = c(10001L, 75672L, 80928L, 80912L, 76261L,
90303L, 91161L, 89841L, 82515L, 84330L, 10001L, 75672L, 80928L,
80912L, 76261L, 90303L, 91161L, 89841L, 82515L, 84330L), Date = c("2009-12",
"2009-12", "2009-12", "2009-12", "2009-12", "2009-12", "2009-12",
"2009-12", "2009-12", "2009-12", "2010-01", "2010-01", "2010-01",
"2010-01", "2010-01", "2010-01", "2010-01", "2010-01", "2010-01",
"2010-01"), ret = c(0.1626, -0.2062, 0.177, -0.0274, 0.0315,
0.1959, 0.2808, 0.0438, 0.0565, -0.1, -0.0189, -0.026, -0.0704,
0.0256, -0.0334, -0.1095, -0.1217, -0.048, -0.0377, 0.0873),
mkt.cap = c(44918.3008, 43722.1389, 689062.2694, 71494.3516,
382438.0821, 964578.8864, 371170.0671, 1235170, 934767.3563,
166769.8187, 43871.6618, 42586.5, 640548.3269, 73322.8542,
369662.6679, 858998.8864, 325990.6705, 1175881.8965, 899493.1499,
181329.0906), BM_OBS = c(0, 1.10450909301826e-05, 6.88713518454942e-06,
9.84511341873784e-06, 2.13437164919912e-06, 0, 5.04687787573149e-06,
0, 2.80382865580601e-05, 0.000146646153873074, 0, 1.11506326339724e-05,
7.28527479914769e-06, 9.43960571401137e-06, 2.17133254998311e-06,
0, 5.65055792544003e-06, 0, 2.86521956868688e-05, 0.000132956141656611
)), class = "data.frame", row.names = c(NA, -20L))
PERMNO Date ret mkt.cap BM_OBS
1 10001 2009-12 0.1626 44918.30 0.000000e+00
2 75672 2009-12 -0.2062 43722.14 1.104509e-05
3 80928 2009-12 0.1770 689062.27 6.887135e-06
4 80912 2009-12 -0.0274 71494.35 9.845113e-06
5 76261 2009-12 0.0315 382438.08 2.134372e-06
6 90303 2009-12 0.1959 964578.89 0.000000e+00
7 91161 2009-12 0.2808 371170.07 5.046878e-06
8 89841 2009-12 0.0438 1235170.00 0.000000e+00
9 82515 2009-12 0.0565 934767.36 2.803829e-05
10 84330 2009-12 -0.1000 166769.82 1.466462e-04
11 10001 2010-01 -0.0189 43871.66 0.000000e+00
12 75672 2010-01 -0.0260 42586.50 1.115063e-05
13 80928 2010-01 -0.0704 640548.33 7.285275e-06
14 80912 2010-01 0.0256 73322.85 9.439606e-06
15 76261 2010-01 -0.0334 369662.67 2.171333e-06
16 90303 2010-01 -0.1095 858998.89 0.000000e+00
17 91161 2010-01 -0.1217 325990.67 5.650558e-06
18 89841 2010-01 -0.0480 1175881.90 0.000000e+00
19 82515 2010-01 -0.0377 899493.15 2.865220e-05
20 84330 2010-01 0.0873 181329.09 1.329561e-04