我有两个面板DF 第一个是布尔值,并且每年都有一个频率
标准
structure(list(Name = c("ff", "fd", "fe", "fr", "fz", "fa", "kl","ml", "az","er", "ff", "fd", "fe", "fr", "fz", "fa", "kl", "ml","az", "er"), Date =c("01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992","01/31/1992", "01/31/1992", "01/31/1993", "01/31/1993", "01/31/1993","01/31/1993", "01/31/1993","01/31/1993", "01/31/1993", "01/31/1993", "01/31/1993", "01/31/1993"), Value = c("FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), .Names = c("Name", "Date", "Value"))
年:在实际例子中每年都会改变到2016年
第二个df也是一个面板表:
Names. Date Value
1 ff 01/31/1992 0.2000
2 fd 01/31/1992 0.4300
3 fe 01/31/1992 NA
4 fr 01/31/1992 0.5200
5 fz 01/31/1992 -0.0020
6 fa 01/31/1992 NA
7 kl 01/31/1992 0.0010
8 ml 01/31/1992 NA
9 az 01/31/1992 0.2200
10 er 01/31/1992 0.3200
11 ff 05/31/1992 0.0000
12 fd 05/31/1992 0.0010
13 fe 05/31/1992 0.0320
14 fr 05/31/1992 0.9123
15 fz 05/31/1992 1.0000
16 fa 05/31/1992 0.3200
17 kl 05/31/1992 0.4300
18 ml 05/31/1992 0.0312
19 az 05/31/1992 0.0312
20 er 05/31/1992 0.4300
21 ff 03/31/1993 0.5300
22 fd 03/31/1993 0.8400
23 fe 03/31/1993 0.0010
24 fr 03/31/1993 -0.0123
25 fz 03/31/1993 0.4300
26 fa 03/31/1993 0.1340
27 kl 03/31/1993 0.7400
28 ml 01/31/1993 0.0312
29 az 01/31/1993 0.9324
30 er 01/31/1993 0.0600
每月变量变化直到年= 2016
哪个名称代表相同的公司名称中的“条件”,变量:每月在第一个df中的相同年份,值:数值表示公司的回报。
我想创建一个df3
Date Portfolio1
01/31/1992 (the monthly mean of comapnies have criteria True this year)
05/31/1992 (the monthly mean of comapnies have criteria True this year)
03/31/1993 (the monthly mean of comapnies have criteria True this year)
哪个日期与df2(每月频率)具有相同的日期,ptf1 =来自公司df2的平均月度值在该年度的df1中具有标准True。
我认为汇总的data.frame非常有用
答案 0 :(得分:0)
您可以使用dplyr
和left_join
与summarise
合作完成此操作。我首先按日期和姓名加入。然后,我使用filter
删除所有非TRUE的行。然后我group_by
约会,然后用summarise
计算均值。
library(dplyr)
Criteria <- structure(list(Name = c("ff", "fd", "fe", "fr", "fz", "fa", "kl","ml", "az","er", "ff", "fd", "fe", "fr", "fz", "fa", "kl", "ml","az", "er"),
Date =c("01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992","01/31/1992", "01/31/1992", "01/31/1993",
"01/31/1993", "01/31/1993","01/31/1993", "01/31/1993","01/31/1993", "01/31/1993", "01/31/1993", "01/31/1993", "01/31/1993"), Value = c("FALSE", "TRUE", "FALSE",
"FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), .Names = c("Name", "Date", "Value"))
df2 <- read.table(text="Name Date Value
ff 01/31/1992 0.2
fd 01/31/1992 0.43
fe 01/31/1992 NA
fr 01/31/1992 0.52
fz 01/31/1992 -0.002
fa 01/31/1992 NA
kl 01/31/1992 0.001
ml 01/31/1992 NA
az 01/31/1992 0.22
er 01/31/1992 0.32
ff 05/31/1992 0
fd 05/31/1992 0.001
fe 05/31/1992 0.032
fr 05/31/1992 0.9123
fz 05/31/1992 1
fa 05/31/1992 0.32
kl 05/31/1992 0.43
ml 05/31/1992 0.0312
az 05/31/1992 0.0312
er 05/31/1992 0.43
ff 03/31/1993 0.53
fd 03/31/1993 0.84
fe 03/31/1993 0.001
fr 03/31/1993 -0.0123
fz 03/31/1993 0.43
fa 03/31/1993 0.134
kl 03/31/1993 0.74
ml 01/31/1993 0.0312
az 01/31/1993 0.9324
er 01/31/1993 0.06",header=TRUE, stringsAsFactors=FALSE)
left_join(df2,Criteria, by=c("Name"="Name","Date"="Date")) %>%
dplyr::filter(Value.y==TRUE) %>%
group_by(Date)%>%
summarise(Portfolio1=mean(Value.x, na.rm = TRUE))
# A tibble: 2 x 2
Date Portfolio1
<chr> <dbl>
1 01/31/1992 0.217
2 01/31/1993 0.060