您好我是stata用户,我正在尝试将我的代码传递给R.我有一个面板数据,如下所示,我正在寻找一个可以创建一个常量变量的命令,根据哪一年和四分之一行位于。在stata中,这些命令将由gen new_variable = yq(year, quarter)
我的数据框看起来像这样
id year quarter
1 2007 1
1 2007 2
1 2007 3
1 2007 4
1 2008 1
1 2008 2
1 2008 3
1 2008 4
1 2009 1
1 2009 2
1 2009 3
1 2009 4
2 2007 1
2 2007 2
2 2007 3
2 2007 4
2 2008 1
2 2008 2
2 2008 3
2 2008 4
3 2009 2
3 2009 3
3 2010 2
3 2010 3
我的预期输出应该如下所示:( new_variable中的值是任意的,只是寻找一个常数值,每年和每季度总是相同的)
id year quarter new_variable
1 2007 1 220
1 2007 2 221
1 2007 3 222
1 2007 4 223
1 2008 1 224
1 2008 2 225
1 2008 3 226
1 2008 4 227
1 2009 1 228
1 2009 2 229
1 2009 3 230
1 2009 4 231
2 2007 1 220
2 2007 2 221
2 2007 3 222
2 2007 4 223
2 2008 1 224
2 2008 2 225
2 2008 3 226
2 2008 4 227
3 2009 2 229
3 2009 3 230
3 2010 2 233
3 2010 3 234
答案 0 :(得分:3)
其中任何一个都可行:
# basic: just concatenate year and quarter
df$new_variable = paste(df$year, df$quarter)
# made for this, has additional options around
# ordering of the categories and including unobserved combos
df$new_variable = interaction(df$year, df$quarter)
# for an integer value, 1 to the number of combos
df$new_variable = as.integer(factor(paste(df$year, df$quarter)))
答案 1 :(得分:2)
以下是两个选项:
library(dplyr) # with dplyr
df %>% mutate(new_variable = group_indices(., year, quarter))
library(data.table) # with data.table
setDT(df)[, new_variable := .GRP, .(year, quarter)]
数据强>
df <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), year = c(2007L,
2007L, 2007L, 2007L, 2008L, 2008L, 2008L, 2008L, 2009L, 2009L,
2009L, 2009L, 2007L, 2007L, 2007L, 2007L, 2008L, 2008L, 2008L,
2008L, 2009L, 2009L, 2010L, 2010L), quarter = c(1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
2L, 3L, 2L, 3L)), .Names = c("id", "year", "quarter"), class = "data.frame", row.names = c(NA,
-24L))
答案 2 :(得分:1)
1)yearqtr 动物园包中的yearqtr
类执行此操作。 yearqtr
个对象的类型为double,其值为年份+ 0表示Q1,年份+ 1/4表示Q2,等等。显示时,它们以有意义的方式显示;然而,它们仍然可以被操纵,好像它们是普通数字,例如如果yq
是yearqtr
变量,则yq + 1是明年的同一季度。
library(zoo)
transform(df, new_variable = as.yearqtr(year + (quarter - 1)/4))
1a)或
transform(df, new_variable = as.yearqtr(paste(year, quarter, sep = "-")))
其中任何一个都给出了:
id year quarter new_variable
1 1 2007 1 2007 Q1
2 1 2007 2 2007 Q2
3 1 2007 3 2007 Q3
4 1 2007 4 2007 Q4
5 1 2008 1 2008 Q1
... etc ...
2)220 如果您特别想要将220分配给第一个日期并让每个后续季度增加1,那么:
transform(df, new_variable = as.numeric(factor(4 * year + quarter)) + 220 - 1)