时间滞后变量

时间:2017-06-19 17:31:50

标签: r

不幸的是,我需要完全重新编辑帖子,因为我最初建议的数据片段并不完全反映我的数据集。基本上我的数据集如下所示:

date       id     person   member_since   consistent
May 2015   2732   3041     2008-09-17     Yes
Jul 2015   2732   3041     2008-09-17     Yes
Sep 2015   2732   3041     2008-09-17     Yes
Nov 2015   2732   3041     2008-09-17     Yes
Jan 2016   2732   3041     2008-09-17     Yes
May 2016   2732   3041     2008-09-17     Yes
Jul 2016   2732   3041     2008-09-17     Yes
Aug 2016   2732   3041     2008-09-17     Yes
Mar 2017   2732   3041     2008-09-17     Yes
Apr 2017   2732   3041     2008-09-17     Yes
May 2017   2732   3041     2008-09-17     Yes

我现在想要按月创建条目,直到" member_since"的月份,保持一切不变(见下文)。此外,还会有一些变量标记为"一致" =否(此处未显示),应保持原样。

date       id     person   member_since   consistent
Sep 2008   2732   3041     2008-09-17     Yes
Oct 2008   2732   3041     2008-09-17     Yes
Nov 2008   2732   3041     2008-09-17     Yes
.
.
.
May 2015   2732   3041     2008-09-17     Yes
Jul 2015   2732   3041     2008-09-17     Yes
Sep 2015   2732   3041     2008-09-17     Yes
Nov 2015   2732   3041     2008-09-17     Yes
Jan 2016   2732   3041     2008-09-17     Yes
May 2016   2732   3041     2008-09-17     Yes
Jul 2016   2732   3041     2008-09-17     Yes
Aug 2016   2732   3041     2008-09-17     Yes
Mar 2017   2732   3041     2008-09-17     Yes
Apr 2017   2732   3041     2008-09-17     Yes
May 2017   2732   3041     2008-09-17     Yes

你能帮我解释一下代码吗?

我认为下面提出的代码是在正确的轨道上,但是还没有达到预期的结果。

library(dplyr)
library(zoo)

monthly_series <- (seq.Date(min(as.Date(df$member_since)),
                            as.Date("2017-05-01"),by="months"))


df2 <-
  df%>%
  group_by(Person)%>%
  merge(monthly_series)%>%
  dplyr::filter((consistent=="Yes"&as.yearmon(as.Date(member_since))<=as.yearmon(y))|
                  (consistent=="No"& as.yearmon(as.Date(member_since))==as.yearmon(y))) %>%
  mutate(Month=as.yearmon(y)) %>%
  select(-y) %>%
  arrange(Person)

提前多多感谢!

4 个答案:

答案 0 :(得分:1)

这是通过Base R的想法,

#convert variables to dates and characters,
df$Member_since <- as.Date(df$Member_since, format = '%Y-%m-%d')
df[c(1, 3)] <- lapply(df[c(1,3)], as.character)
str(df)
'data.frame': 3 obs. of  3 variables:
$ Person      : chr  "A" "B" "C"
$ Member_since: Date, format: "2013-08-04" "2011-12-23" "2014-09-03"
$ Applicable  : chr  "Yes" "No" "Yes"
#create a list with monthly sequence of dates
l1 <- sapply(df$Member_since[df$Applicable == 'Yes'], function(i) 
                                        seq.Date(i, as.Date('2017-05-01'), by = 'month'))
#Create an extra variable based on the lengths of l1 to act as counter for expanding the data frame
df$cnt[df$Applicable == 'Yes'] <- lengths(l1)
#Change NA to 1 so it does not expand
df[is.na(df)] <- 1
#expand data frame
df1 <- df[rep(rownames(df), df$cnt),]
#create your Month variable by substituting the dates from l1
df1$Month[df1$Applicable == 'Yes'] <- format(do.call(c, l1), format = '%b %Y')
#take care of the NAs emerged from applicable being NO
df1$Month <- mapply(function(x, y) 
                replace(x, is.na(x), format(y, '%b %Y')), df1$Month, df1$Member_since)
#drop unwanted columns
df1$cnt <- NULL

head(df1)
#    Person Member_since Applicable    Month
#1        A   2013-08-04        Yes Aug 2013
#1.1      A   2013-08-04        Yes Sep 2013
#1.2      A   2013-08-04        Yes Oct 2013
#1.3      A   2013-08-04        Yes Nov 2013
#1.4      A   2013-08-04        Yes Dec 2013
#1.5      A   2013-08-04        Yes Jan 2014

tail(df1)
#     Person Member_since Applicable    Month
#3.26      C   2014-09-03        Yes Nov 2016
#3.27      C   2014-09-03        Yes Dec 2016
#3.28      C   2014-09-03        Yes Jan 2017
#3.29      C   2014-09-03        Yes Feb 2017
#3.30      C   2014-09-03        Yes Mar 2017
#3.31      C   2014-09-03        Yes Apr 2017

数据

dput(df)
structure(list(Person = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), Member_since = structure(c(2L, 1L, 3L), .Label = c("2011-12-23", 
"2013-08-04", "2014-09-03"), class = "factor"), Applicable = structure(c(2L, 
1L, 2L), .Label = c("No", "Yes"), class = "factor")), .Names = c("Person", 
"Member_since", "Applicable"), class = "data.frame", row.names = c(NA, 
-3L))

答案 1 :(得分:1)

以下是dplyryearmonzoo的帮助下如何做到这一点。

基本上,我们的想法是使用原始data.frame创建一个带有seq.Datemerge的月度日期向量。然后,您可以filter输出不需要的行。我正在使用dplyr::filter因为filter被我机器上的另一个包屏蔽了。在你的身上,单独使用filter就足够了。

library(dplyr);library(zoo)
df <- read.table(text="Person Member_since  Applicable
A      2013-08-04    Yes
B      2011-12-23    No
C      2014-09-03    Yes",header=TRUE,stringsAsFactors=FALSE)

monthly_series <- (seq.Date(min(as.Date(df$Member_since)),
                     as.Date("2017-05-01"),by="months"))

df%>%
  group_by(Person)%>%
  merge(monthly_series)%>%
  dplyr::filter((Applicable=="Yes"&as.yearmon(as.Date(Member_since))<=as.yearmon(y))|
                (Applicable=="No"& as.yearmon(as.Date(Member_since))==as.yearmon(y))) %>%
  mutate(Month=as.yearmon(y)) %>%
  select(-y) %>%
  arrange(Person)



   Person Member_since Applicable    Month
1       A   2013-08-04        Yes Aug 2013
2       A   2013-08-04        Yes Sep 2013
3       A   2013-08-04        Yes Oct 2013
4       A   2013-08-04        Yes Nov 2013
5       A   2013-08-04        Yes Dec 2013
6       A   2013-08-04        Yes Jan 2014
7       A   2013-08-04        Yes Feb 2014
8       A   2013-08-04        Yes Mar 2014
9       A   2013-08-04        Yes Apr 2014
10      A   2013-08-04        Yes May 2014
11      A   2013-08-04        Yes Jun 2014
12      A   2013-08-04        Yes Jul 2014
13      A   2013-08-04        Yes Aug 2014
14      A   2013-08-04        Yes Sep 2014
15      A   2013-08-04        Yes Oct 2014
16      A   2013-08-04        Yes Nov 2014
17      A   2013-08-04        Yes Dec 2014
18      A   2013-08-04        Yes Jan 2015
19      A   2013-08-04        Yes Feb 2015
20      A   2013-08-04        Yes Mar 2015
21      A   2013-08-04        Yes Apr 2015
22      A   2013-08-04        Yes May 2015
23      A   2013-08-04        Yes Jun 2015
24      A   2013-08-04        Yes Jul 2015
25      A   2013-08-04        Yes Aug 2015
26      A   2013-08-04        Yes Sep 2015
27      A   2013-08-04        Yes Oct 2015
28      A   2013-08-04        Yes Nov 2015
29      A   2013-08-04        Yes Dec 2015
30      A   2013-08-04        Yes Jan 2016
31      A   2013-08-04        Yes Feb 2016
32      A   2013-08-04        Yes Mar 2016
33      A   2013-08-04        Yes Apr 2016
34      A   2013-08-04        Yes May 2016
35      A   2013-08-04        Yes Jun 2016
36      A   2013-08-04        Yes Jul 2016
37      A   2013-08-04        Yes Aug 2016
38      A   2013-08-04        Yes Sep 2016
39      A   2013-08-04        Yes Oct 2016
40      A   2013-08-04        Yes Nov 2016
41      A   2013-08-04        Yes Dec 2016
42      A   2013-08-04        Yes Jan 2017
43      A   2013-08-04        Yes Feb 2017
44      A   2013-08-04        Yes Mar 2017
45      A   2013-08-04        Yes Apr 2017
46      B   2011-12-23         No Dec 2011
47      C   2014-09-03        Yes Sep 2014
48      C   2014-09-03        Yes Oct 2014
49      C   2014-09-03        Yes Nov 2014
50      C   2014-09-03        Yes Dec 2014
51      C   2014-09-03        Yes Jan 2015
52      C   2014-09-03        Yes Feb 2015
53      C   2014-09-03        Yes Mar 2015
54      C   2014-09-03        Yes Apr 2015
55      C   2014-09-03        Yes May 2015
56      C   2014-09-03        Yes Jun 2015
57      C   2014-09-03        Yes Jul 2015
58      C   2014-09-03        Yes Aug 2015
59      C   2014-09-03        Yes Sep 2015
60      C   2014-09-03        Yes Oct 2015
61      C   2014-09-03        Yes Nov 2015
62      C   2014-09-03        Yes Dec 2015
63      C   2014-09-03        Yes Jan 2016
64      C   2014-09-03        Yes Feb 2016
65      C   2014-09-03        Yes Mar 2016
66      C   2014-09-03        Yes Apr 2016
67      C   2014-09-03        Yes May 2016
68      C   2014-09-03        Yes Jun 2016
69      C   2014-09-03        Yes Jul 2016
70      C   2014-09-03        Yes Aug 2016
71      C   2014-09-03        Yes Sep 2016
72      C   2014-09-03        Yes Oct 2016
73      C   2014-09-03        Yes Nov 2016
74      C   2014-09-03        Yes Dec 2016
75      C   2014-09-03        Yes Jan 2017
76      C   2014-09-03        Yes Feb 2017
77      C   2014-09-03        Yes Mar 2017
78      C   2014-09-03        Yes Apr 2017

答案 2 :(得分:1)

你应该检查这个包:https://cran.r-project.org/web/packages/padr/vignettes/padr.html

您的答案将如下所示: let container = CKContainer(identifier: kContainerName) let containerDescription = String(describing: container) if containerDescription.contains("containerEnvironment=\"Sandbox\"") { showTestMode() // or whatever you want to do differently } 。 然后我会df2 <- df %>% filter(Applicable == "Yes") %>% pad(group = "Person", start_val = Member_since, end_val = "2017-05-04", interval = "month"),这样你就可以在一个单独的数据集中得到所有“否”的答案。 最后,您可以使用append_df <- df %>% filter(Applicable == "No")重新添加。这应该会让你走上正确的道路。

答案 3 :(得分:0)

这是您拥有的众多选项的plyr替代方案。您的示例数据:

library(plyr)

df <- read.table( text = "Person Member_since  Applicable
A      2013-08-04    Yes
B      2011-12-23    No
C      2014-09-03    Yes", header = TRUE)

确保Member_since采用日期格式:

df$Member_since <- as.Date(df$Member_since)

如果adply为“是”,则使用Applicable迭代行并使用连续月份构建新行,如果Applicable为“否”,则使用原始行。

df2<-adply(df, 1, function(x) {
  if(x$Applicable == "Yes"){
    data.frame(x, month = format(
      seq.Date(as.Date(x$Member_since), as.Date("2017-06-01"), by = "month"),  
      "%b %Y"))
  }
  else{
    data.frame(x, month = format(x$Member_since, "%b %Y"))
  }
}
)

> head(df2);tail(df2)
  Person Member_since Applicable    month
1      A   2013-08-04        Yes Aug 2013
2      A   2013-08-04        Yes Sep 2013
3      A   2013-08-04        Yes Oct 2013
4      A   2013-08-04        Yes Nov 2013
5      A   2013-08-04        Yes Dec 2013
6      A   2013-08-04        Yes Jan 2014
   Person Member_since Applicable    month
75      C   2014-09-03        Yes Dec 2016
76      C   2014-09-03        Yes Jan 2017
77      C   2014-09-03        Yes Feb 2017
78      C   2014-09-03        Yes Mar 2017
79      C   2014-09-03        Yes Apr 2017
80      C   2014-09-03        Yes May 2017