我试图遍历R中的多个数据帧,并从每个数据帧中提取一列,然后在单独的列表中减去元素的值。例如,我想要
df1$my_new_col<-df1$my_col - my_list[[1]]
df2$my_new_col<-df2$my_col - my_list[[2]]
etc...
我编写的代码仅提取列表中的最后一个元素,并将其用于所有计算中。
简而言之,我有一个名为日期的列表,它是日期的列表,还有一个名为spx_list的列表,它是标准普尔500指数年化收益的列表。
在我的代码中,我试图遍历每个日期并提取该日期的股票收益数据框架。在第二个循环中,我遍历了也与每个日期相对应的标准普尔收益表,并尝试从该期间的每个股票收益中减去每个标准普尔500收益。每个日期提取时,我的数据框如下所示:
对于日期1
Ticker Name Total.Return.Y.3..I.
JNS US Equity JANUS CAPITAL GR 25.27
UNP US Equity UNION PAC CORP 24.98
CVX US Equity CHEVRON CORP 24.87
BHI US Equity BAKER HUGHES A G 24.81
RAI US Equity REYNOLDS AMERICA 24.72
XOM US Equity EXXON MOBIL CORP 24.55
CBRE US Equity CBRE GROUP INC-A 24.43
GT US Equity GOODYEAR TIRE 24.39
对于日期2
Ticker Name Total.Return.Y.3..I.
JNS US Equity JANUS CAPITAL GR 21.03
UNP US Equity UNION PAC CORP 16.33
CVX US Equity CHEVRON CORP 12.21
BHI US Equity BAKER HUGHES A G 47.69
RAI US Equity REYNOLDS AMERICA 18.39
XOM US Equity EXXON MOBIL CORP 24.50
CBRE US Equity CBRE GROUP INC-A 10.81
GT US Equity GOODYEAR TIRE 11.13
对于我的标准普尔数据:
Ticker date Annualized 3
SPX INDEX 3/31/2019 11.22854225
SPX INDEX 12/31/2018 7.041799573
SPX INDEX 9/30/2018 14.91926793
SPX INDEX 6/30/2018 9.629826851
列表数据
dates <- list('2019-03-31','2018-12-31','2018-09-30','2018-06-30',
'2018-03-31','2017-12-31','2017-09-30','2017-06-30',
'2017-03-31','2016-12-31','2016-09-30','2016-06-30',
'2016-03-31','2015-12-31','2015-09-30','2015-06-30',
'2015-03-31','2014-12-31','2014-09-30','2014-06-30',
'2014-03-31','2013-12-31','2013-09-30','2013-06-30',
'2013-03-31','2012-12-31','2012-09-30','2012-06-30',
'2012-03-31','2011-12-31','2011-09-30','2011-06-30',
'2011-03-31','2010-12-31','2010-09-30','2010-06-30',
'2010-03-31','2009-12-31','2009-09-30','2009-06-30',
'2009-03-31','2008-12-31','2008-09-30','2008-06-30',
'2008-03-31','2007-12-31','2007-09-30','2007-06-30',
'2007-03-31','2006-12-31','2006-09-30','2006-06-30',
'2006-03-31','2005-12-31','2005-09-30','2005-06-30',
'2005-03-31','2004-12-31','2004-09-30','2004-06-30',
'2004-03-31','2003-12-31','2003-09-30','2003-06-30',
'2003-03-31','2002-12-31','2002-09-30','2002-06-30',
'2002-03-31','2001-12-31','2001-09-30','2001-06-30',
'2001-03-31','2000-12-31','2000-09-30','2000-06-30',
'2000-03-31')
代码
library(Rblpapi)
blpConnect()
library(dplyr)
spx <- read.csv('spx_3.csv')
spx_list <- as.list(spx$Annualized.3)
totals <- list()
returns <- list()
for(i in dates){
df <- beqs('ROLLING RETURNS','PRIVATE',date=as.Date(i))
df_beats <- df%>%
select(date,Ticker,Total.Return.Y.3..I.)
df_beats <- na.omit(df_beats)
for(j in 1:length(spx_list)){
df_beats$Relative_Performance <- df_beats$Total.Return.Y.3..I.-spx_list[[j]]
counts <- sum(df_beats$Relative_Performance>0)
yes <- df_beats%>%
filter(df_beats$Relative_Performance>0)
averages <- mean(yes$Total.Return.Y.3..I.)
totals[[i]] <- counts
returns[[i]] <- averages
}
}
目标是找出给定年份的股票超过标准普尔500指数的百分比,并通过从单个股票收益中减去标准普尔500的收益来找出每种股票的表现不佳或表现出色。
当循环完成时,我发现在计算中仅使用了S&p 500列表的最后一个元素,而其他收益则被跳过。因此,在这种情况下,将在每个时间段内为每个数据帧使用9.629的值。理想情况下,我希望从Date 1 Total.Return.Y.3..I中减去11.22,从Date 2 $ Total.Return.Y.3..I中减去7.04,等等。
我想知道是否有人能够帮我提取每个时期的数据,而不是仅仅使用标准普尔500指数中的最后一个元素?
答案 0 :(得分:0)
考虑按组而不是嵌套循环处理您的过程。具体来说,构建单个 beqs 数据集,然后按日期将其与 spx 合并以进行差值计算。最后,按日期汇总所需的总计和回报。
spx <- read.csv('spx_3.csv')
# BUILD LIST OF BEQS DATA FRAMES FOR EACH QUARTERLY DATE
df_list <- lapply(spx$date, function(i) {
df <- beqs('ROLLING RETURNS', 'PRIVATE', date=as.Date(i))
df <- df[c("date", "Ticker", "Total.Return.Y.3..I.")]
return(na.omit(df))
})
# APPEND ALL FOR SINGLE DATA FRAME
df_beqs <- do.call(rbind, df_list)
# MERGE AND ADD NEW COLUMN
final_df <- transform(merge(df_beats, spx, by = "date"),
Relative_Performance = `Total.Return.Y.3..I.` - `Annualized 3`)
# FILTER DATA FRAME
final_df <- final_df[final_df$Relative_Performance > 0,]
# AGGREGATE BY DATE FOR MATRIX OUTPUT
agg_df <- aggregate(Total.Return.Y.3..I. ~ date, final_df,
function(x) c(totals = length(x), returns = mean(x)))