创建以其他列中的值为条件的索引;随着时间的推移

时间:2014-09-04 00:00:29

标签: r plyr seq

我正在努力解决以下问题: 下面的数据框包含各种ID随时间变化的值的开发。我试图得到的是这些值的增加/减少基于事件发生的一年中的值。一个id内可能发生多个事件,因此新事件将成为id的新基准年。 为了使事情更清楚,我还在下面添加了我想要的结果

我有什么

id  value   year    event
a   100     1950    NA
a   101     1951    NA
a   102     1952    NA
a   103     1953    NA
a   104     1954    NA
a   105     1955    X
a   106     1956    NA
a   107     1957    NA
a   108     1958    NA
a   107     1959    Y
a   106     1960    NA
a   105     1961    NA
a   104.8   1962    NA
a   104.2   1963    NA
b   70      1970    NA
b   75      1971    NA
b   80      1972    NA
b   85      1973    NA
b   90      1974    NA
b   60      1975    Z
b   59      1976    NA
b   58      1977    NA
b   57      1978    NA
b   56      1979    NA
b   55      1980    W
b   54      1981    NA
b   53      1982    NA
b   52      1983    NA
b   51      1984    NA

我在寻找什么

id  value   year    event   index   growth
a   100     1950    NA        0 
a   101     1951    NA        0 
a   102     1952    NA        0 
a   103     1953    NA        0 
a   104     1954    NA        0 
a   105     1955    X         1      1
a   106     1956    NA        2      1.00952381
a   107     1957    NA        3      1.019047619
a   108     1958    NA        4      1.028571429
a   107     1959    Y         1      1                  #new baseline year
a   106     1960    NA        2      0.990654206
a   105     1961    NA        3      0.981308411
a   104.8   1962    NA        4      0.979439252
a   104.2   1963    NA        5      0.973831776
b   70      1970    NA        6 
b   75      1971    NA        7 
b   80      1972    NA        8 
b   85      1973    NA        9 
b   90      1974    NA       10 
b   60      1975    Z         1      1
b   59      1976    NA        2      0.983333333
b   58      1977    NA        3      0.966666667
b   57      1978    NA        4      0.95
b   56      1979    NA        5      0.933333333
b   55      1980    W         1      1                #new baseline year
b   54      1981    NA        2      0.981818182
b   53      1982    NA        3      0.963636364
b   52      1983    NA        4      0.945454545
b   51      1984    NA        5      0.927272727

我尝试了什么

Thisthis帖子非常有用,我设法创造了多年之间的差异,但是,当有新事件时,我无法重置基准年(索引)。此外,我怀疑我的方法是否确实是最有效/优雅的方法。对我来说似乎有点笨拙......

x <- ddply(x, .(id), transform, year.min=min(year[!is.na(event)]))  #identifies first event year
x1 <- ddply(x[x$year>=x$year.min,], .(id), transform, index=seq_along(id)) #creates counter years following first event; prior years are removed
x1 <- x1[order(x1$id, x1$year),] #sort 
x1 <- ddply(x1, .(id), transform, growth=100*(value/value[1])) #calculate difference, however, based on first event year; this is wrong.

library(Interact)  #i then merge the df with the years prior to first event which have been removed in the begining
x$id.year <- interaction(x$id,x$year)
x1$id.year <- interaction(x1$id,x1$year)
x$index <- x$growth <- NA
y <- rbind(x[x$year<x$year.min,],x1)
y <- y[order(y$id,y$year),]

非常感谢您的任何建议。

3 个答案:

答案 0 :(得分:2)

# Create a tag to indicate the start of each new event by id or
# when id changes
dat$tag <- with(dat, ave(as.character(event), as.character(id), 
                                    FUN=function(i) cumsum(!is.na(i))))

# Calculate the growth by id and tag
# this will also produce results for each id before an event has happened
dat$growth <- with(dat, ave(value, tag, id,  FUN=function(i)  i/i[1] ))

# remove growth prior to an event (this will be when tag equals zero as no 
# event have occurred)
dat$growth[dat$tag==0] <- NA

答案 1 :(得分:1)

这是dplyr的解决方案。

ana <- group_by(mydf, id) %>%
       do(na.locf(., na.rm = FALSE)) %>%
       mutate(value = as.numeric(value)) %>%
       group_by(id, event) %>%
       mutate(growth = value/value[1]) %>%
       mutate(index = row_number(event))

ana$growth[is.na(ana$event)] <- 0

   id value year event    growth index
1   a 100.0 1950    NA 0.0000000     1
2   a 101.0 1951    NA 0.0000000     2
3   a 102.0 1952    NA 0.0000000     3
4   a 103.0 1953    NA 0.0000000     4
5   a 104.0 1954    NA 0.0000000     5
6   a 105.0 1955     X 1.0000000     1
7   a 106.0 1956     X 1.0095238     2
8   a 107.0 1957     X 1.0190476     3
9   a 108.0 1958     X 1.0285714     4
10  a 107.0 1959     Y 1.0000000     1
11  a 106.0 1960     Y 0.9906542     2
12  a 105.0 1961     Y 0.9813084     3
13  a 104.8 1962     Y 0.9794393     4
14  a 104.2 1963     Y 0.9738318     5
15  b  70.0 1970    NA 0.0000000     1
16  b  75.0 1971    NA 0.0000000     2
17  b  80.0 1972    NA 0.0000000     3
18  b  85.0 1973    NA 0.0000000     4
19  b  90.0 1974    NA 0.0000000     5
20  b  60.0 1975     Z 1.0000000     1
21  b  59.0 1976     Z 0.9833333     2
22  b  58.0 1977     Z 0.9666667     3
23  b  57.0 1978     Z 0.9500000     4
24  b  56.0 1979     Z 0.9333333     5
25  b  55.0 1980     W 1.0000000     1
26  b  54.0 1981     W 0.9818182     2
27  b  53.0 1982     W 0.9636364     3
28  b  52.0 1983     W 0.9454545     4

答案 2 :(得分:0)

尝试:

ddf$index=0
ddf$growth=0
baseline =0
r=1; start=FALSE
for(r in 1:nrow(ddf)){
    if(is.na(ddf$event[r])){
        if(start) {
            ddf$index[r] = ddf$index[r-1]+1
            ddf$growth[r] = ddf$value[r]/baseline
        }
        else {ddf$index[r] = 0;
        }
    }
    else{
        start=T
        ddf$index[r] = 1
        ddf$growth[r]=1
        baseline = ddf$value[r]
    }
}

ddf
   id value year event index    growth
1   a 100.0 1950  <NA>     0 0.0000000
2   a 101.0 1951  <NA>     0 0.0000000
3   a 102.0 1952  <NA>     0 0.0000000
4   a 103.0 1953  <NA>     0 0.0000000
5   a 104.0 1954  <NA>     0 0.0000000
6   a 105.0 1955     X     1 1.0000000
7   a 106.0 1956  <NA>     2 1.0095238
8   a 107.0 1957  <NA>     3 1.0190476
9   a 108.0 1958  <NA>     4 1.0285714
10  a 107.0 1959     Y     1 1.0000000
11  a 106.0 1960  <NA>     2 0.9906542
12  a 105.0 1961  <NA>     3 0.9813084
13  a 104.8 1962  <NA>     4 0.9794393
14  a 104.2 1963  <NA>     5 0.9738318
15  b  70.0 1970  <NA>     6 0.6542056
16  b  75.0 1971  <NA>     7 0.7009346
17  b  80.0 1972  <NA>     8 0.7476636
18  b  85.0 1973  <NA>     9 0.7943925
19  b  90.0 1974  <NA>    10 0.8411215
20  b  60.0 1975     Z     1 1.0000000
21  b  59.0 1976  <NA>     2 0.9833333
22  b  58.0 1977  <NA>     3 0.9666667
23  b  57.0 1978  <NA>     4 0.9500000
24  b  56.0 1979  <NA>     5 0.9333333
25  b  55.0 1980     W     1 1.0000000
26  b  54.0 1981  <NA>     2 0.9818182
27  b  53.0 1982  <NA>     3 0.9636364
28  b  52.0 1983  <NA>     4 0.9454545
29  b  51.0 1984  <NA>     5 0.9272727