我的数据有三个主要列,如下所示(squirrel_id
=唯一个人ID,byear
=出生年份,dyear
=死亡年份):
> summary(complete)
squirrel_id byear dyear
Min. : 416 Min. :1989 Min. :1989
1st Qu.: 4152 1st Qu.:1997 1st Qu.:1998
Median : 7870 Median :2003 Median :2004
Mean :10419 Mean :2004 Mean :2004
3rd Qu.:16126 3rd Qu.:2011 3rd Qu.:2012
Max. :23327 Max. :2017 Max. :2017
我有第二条数据(如下所示),我试图将其合并到上述数据集中。
mast.yr<-c("1993", "1998", "2005", "2010", "2014")
我正在尝试做两件事:
squirrel_id
)在mast.yr
年(dyear
- byear
=活着年数范围内是否还活着(包括{ {1}}和byear
)。dyear
)在生命周期中经历的mast.yr
年(squirrel_id
- dyear
=活着年数(包括{{1 }和byear
)。要生成第一列,我一直在使用byear
包中的dyear
函数,但我只能让它适用于mutate
和{{1}分开,像这样:
dplyr
但它没有给出所需的输出,因为它自己考虑byear
和dyear
,而不是连续的时间段。我已经尝试了here和here发布的解决方案,但没有运气。
任何建议将不胜感激!
我的数据副本可以找到here。为了将来的再现性,这里有一个样本:
complete <- complete %>%
mutate (mast = ifelse (byear %in% c("1993", "1998", "2005", "2010", "2014"), 1, 0),
mast = ifelse (dyear %in% c("1993", "1998", "2005", "2010", "2014"), 1, 0)))
答案 0 :(得分:2)
# put target years in a table
mastDF = data_frame(year = as.integer(mast.yr))
# count based on conditions
dat %>%
mutate(in_mast = count_matches(., mastDF, year >= byear, year <= dyear) > 0) %>%
as.tbl
# A tibble: 100 x 4
squirrel_id byear dyear in_mast
<int> <int> <int> <lgl>
1 6715 2006 2006 FALSE
2 22274 2016 2017 FALSE
3 20445 2014 2017 TRUE
4 19528 2013 2013 FALSE
5 2674 1995 1995 FALSE
6 1419 1992 1992 FALSE
7 15014 2004 2004 FALSE
8 10946 2009 2012 TRUE
9 4369 1998 1999 TRUE
10 4344 1992 1999 TRUE
# ... with 90 more rows
其中count_matches
是辅助函数:
library(data.table)
count_matches = function(DF, targetDF, ...){
onexpr = substitute(list(...))
data.table(targetDF)[data.table(DF), on=eval(onexpr), allow.cart=TRUE, .N, by=.EACHI]$N
}
如果你想要计数以及计数是否为非零,可以通过将其分解为mutate
个参数序列来完成:
dat %>%
mutate(
n_mast = count_matches(., mastDF, year >= byear, year <= dyear),
in_mast = n_mast > 0
) %>% as.tbl
# A tibble: 6 x 5
squirrel_id byear dyear n_mast in_mast
<int> <int> <int> <int> <lgl>
1 6715 2006 2006 0 FALSE
2 22274 2016 2017 0 FALSE
3 20445 2014 2017 1 TRUE
4 19528 2013 2013 0 FALSE
5 2674 1995 1995 0 FALSE
6 1419 1992 1993 1 TRUE
答案 1 :(得分:1)
虽然@Frank已经提供了一个优雅的解决方案,但sqldf
为非equii连接提供了更简单的方法。使用sqldf
解决方案可以是:
mast.yr<-c("1993", "1998", "2005", "2010", "2014")
mastDf <- data.frame(year = as.integer(mast.yr))
library(sqldf)
sqldf("select dat.*, IFNULL(Mast.inMast,0) as n_Mast, IFNULL(Mast.inMast,0) >0 as inMast
from dat left outer join
(select *, count(squirrel_id) as inMast
from dat, mastDf
where mastDf.year between dat.byear AND dat.dyear
group by squirrel_id) Mast on
dat.squirrel_id = Mast.squirrel_id")
# squirrel_id byear dyear n_Mast inMast
# 1 6715 2006 2006 0 0
# 2 22274 2016 2017 0 0
# 3 20445 2014 2017 1 1
# 4 19528 2013 2013 0 0
# 5 2674 1995 1995 0 0
# 6 1419 1992 1992 0 0
# 7 15014 2004 2004 0 0
# 8 10946 2009 2012 1 1
# 9 4369 1998 1999 1 1
# 10 4344 1992 1999 2 1
#....90 more rows