我正在尝试使用加班结构重新编码现有数据。我的数据集如下所示:
dput(z)
structure(list(democracy = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), year.x = 1967:2008, time = c(1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42)), .Names = c("democracy", "year.x", "time"), row.names = 176:217, class = "data.frame")
所以我想创建一个新的变量,比如time.democ,如果democracy==0
取值为零,但是再次开始计算时间段,从1开始计算,如果是democracy ==1
,再次democracy==0
。我将为一系列国家做这件事,但我假设如果我正确使用这个功能,使用ddply就可以很容易地推广。有什么建议?
我想得到这个:
dput(z)
structure(list(democracy = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), year.x = 1967:2008, time = c(1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42), new.time = c(0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0,
0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25)), .Names = c("democracy",
"year.x", "time", "new.time"), row.names = 176:217, class = "data.frame")
谢谢!
答案 0 :(得分:1)
您可以使用rle
与sequence
相结合来执行此操作。 rle
执行运行长度编码,而sequence
生成序列。
z$new.time <- sequence(rle(z$democracy)$lengths)
z$new.time[z$democracy==0] <- 0
head(z, 20)
democracy year.x time new.time
176 0 1967 1 0
177 0 1968 2 0
178 0 1969 3 0
179 0 1970 4 0
180 0 1971 5 0
181 0 1972 6 0
182 1 1973 7 1
183 1 1974 8 2
184 1 1975 9 3
185 0 1976 10 0
186 0 1977 11 0
187 0 1978 12 0
188 0 1979 13 0
189 0 1980 14 0
190 0 1981 15 0
191 0 1982 16 0
192 1 1983 17 1
193 1 1984 18 2
194 1 1985 19 3
195 1 1986 20 4
答案 1 :(得分:0)
感谢您的回复。我按照你的建议,最后编写了一个函数,以便我可以通过ddply将它应用于我的(纵向)数据集中的所有单元。我发布它可能对其他人有帮助,但我确信有更优雅的解决方案:
# is a long format data frame
new.time <- function(a){
a <- a[order(a$year.x),]
a$new.time <- sequence(rle(a$democracy)$lengths)-1
a$new.time[a$democracy==0] <- 0
return(a)
}
merged1 <- ddply(merged, .(country.x), new.time)