标准化时间序列,用R重塑

时间:2015-04-29 03:33:10

标签: r

我有一个数据集,类似于下面的数据集,并且仍然希望在重新变换为long并将其用作时间序列之前修改一些内容。简化版数据:

 rem2008 <- rnorm(30, 10, 5)
 rem2009 <- rnorm(30, 8, 3)
 rem2010 <- rnorm(30, 23, 3)
 ID <- sample( LETTERS[1:30], 30)
 currency <- sample( LETTERS[1:2], 30, replace= TRUE)
 A2008 <- rnorm(30, 357, 5)
 A2009 <- rnorm(30, 357, 5)
 A2010 <- rnorm(30, 357, 5)
 B2008 <- rnorm(30, 1500, 5)
 B2009 <- rnorm(30, 1500, 5)
 B2010 <- rnorm(30, 1500, 5)

 data <- cbind(currency,ID, rem2008, rem2009, rem2010, A2008, A2009, A2010,  B2008, B2009, B2010)

首先,我想“规范化”所有汇率和汇款(所有变量以:A *,B *,rem *开头),并生成一个具有相关汇率的“处理”栏:

A2009 = A2009/2008
A2010 = A2010/2008
Treat2008[currency==A] = A2008
Treat2008[currency==B] = B2008

理想情况下,融化后我想得到4列:Id,Rem,Year,Currency,Treat。

实现这一目标的最佳方法是什么?我可以在Stata中轻松地使用几个foreach循环并重新整形,但我想直接用R来完成,而不是一直导入和导出数据。我应该在重塑之前完成所有操作吗,还是有一些命令要用于长格式的操作呢?

1 个答案:

答案 0 :(得分:1)

如果我的要求正确,我不能100%确定,请告诉我这是否是您想要的:

transform(with(list(w=c('rem','A','B')),reshape(data,dir='l',varying=lapply(w,function(pre) grep(paste0('^',pre,'\\d+$'),names(data))),v.names=w,timevar='year',times=c(2008,2009,2010))),treat=ifelse(currency=='A',A,B),A=NULL,B=NULL,id=NULL);
##         currency   ID year        rem     treat
## 1.2008         A    J 2008  1.3812811  370.4443
## 2.2008         A    X 2008 12.5763558  354.9506
## 3.2008         A    M 2008 10.6965128  356.8754
## 4.2008         A <NA> 2008 11.3767042  351.8991
## 5.2008         A    A 2008  2.9932609  365.7032
## 6.2008         B    F 2008  8.8211956 1497.4942
## 7.2008         B    K 2008  5.5928628 1495.0110
## 8.2008         B    T 2008 10.7754368 1495.1801
## 9.2008         A <NA> 2008  3.7113704  360.5038
## 10.2008        A    W 2008 11.4948061  359.6662
## 11.2008        A    I 2008 13.6957023  360.6984
## 12.2008        A    P 2008  0.6494516  350.5083
## 13.2008        A    Y 2008  5.9079508  351.2960
## 14.2008        A    C 2008  5.5337997  362.8112
## 15.2008        A    V 2008  5.9769623  359.5147
## 16.2008        A    H 2008 15.8466935  356.9736
## 17.2008        B    G 2008 11.4490855 1501.6727
## 18.2008        A    U 2008 19.8888661  362.6460
## 19.2008        B    E 2008  4.3677369 1508.0391
## 20.2008        B    D 2008 18.5607637 1505.0454
## 21.2008        A    R 2008  9.6847677  357.0053
## 22.2008        A <NA> 2008  6.8128572  356.8416
## 23.2008        A    Q 2008  2.7850994  352.5719
## 24.2008        A    N 2008 16.2758518  362.7819
## 25.2008        B    O 2008  8.9010772 1488.3458
## 26.2008        A    B 2008  7.0311623  357.1095
## 27.2008        A    Z 2008 13.3984208  360.6529
## 28.2008        A    L 2008  7.8227800  352.9813
## 29.2008        A <NA> 2008 14.4264659  350.2792
## 30.2008        A    S 2008 16.6254484  355.3690
## 1.2009         A    J 2009 10.3847732  357.6774
## 2.2009         A    X 2009  8.6973661  359.1131
## 3.2009         A    M 2009 12.0156849  357.1485
## 4.2009         A <NA> 2009  1.3379483  352.8424
## 5.2009         A    A 2009  8.3602428  354.3114
## 6.2009         B    F 2009  6.4862632 1499.4552
## 7.2009         B    K 2009  8.1214409 1504.2514
## 8.2009         B    T 2009 10.4958449 1492.5220
## 9.2009         A <NA> 2009  6.5139695  364.0226
## 10.2009        A    W 2009 13.1565171  361.7706
## 11.2009        A    I 2009  7.3096138  354.9056
## 12.2009        A    P 2009  7.5492308  356.2868
## 13.2009        A    Y 2009  7.5033149  355.2381
## 14.2009        A    C 2009  5.6782097  360.6068
## 15.2009        A    V 2009  3.7370571  361.0948
## 16.2009        A    H 2009  8.8469938  354.0782
## 17.2009        B    G 2009  8.1174960 1494.2991
## 18.2009        A    U 2009  9.2063056  350.2426
## 19.2009        B    E 2009  8.0788092 1507.2134
## 20.2009        B    D 2009  6.6348056 1498.3001
## 21.2009        A    R 2009  7.9650947  353.1720
## 22.2009        A <NA> 2009  3.5795757  354.2822
## 23.2009        A    Q 2009  8.5598213  352.8076
## 24.2009        A    N 2009  6.9101325  350.0534
## 25.2009        B    O 2009  8.7567846 1507.4454
## 26.2009        A    B 2009  7.5298550  360.0886
## 27.2009        A    Z 2009  7.5084281  361.7335
## 28.2009        A    L 2009 10.0847608  367.7977
## 29.2009        A <NA> 2009  6.4676085  358.1437
## 30.2009        A    S 2009 10.9973672  354.5373
## 1.2010         A    J 2010 22.9083834  360.8241
## 2.2010         A    X 2010 21.2021885  362.2936
## 3.2010         A    M 2010 23.8105198  356.7900
## 4.2010         A <NA> 2010 24.1970138  367.8831
## 5.2010         A    A 2010 20.8117066  365.3983
## 6.2010         B    F 2010 24.1926874 1497.4581
## 7.2010         B    K 2010 18.4149211 1492.5715
## 8.2010         B    T 2010 18.6693716 1497.6254
## 9.2010         A <NA> 2010 27.6525893  364.0436
## 10.2010        A    W 2010 23.3530057  353.5919
## 11.2010        A    I 2010 23.4712772  359.9814
## 12.2010        A    P 2010 22.2459282  353.9465
## 13.2010        A    Y 2010 24.0533707  349.6421
## 14.2010        A    C 2010 22.3999464  357.0695
## 15.2010        A    V 2010 19.3420145  360.2639
## 16.2010        A    H 2010 20.4927189  359.7127
## 17.2010        B    G 2010 22.6278613 1498.9522
## 18.2010        A    U 2010 24.4046376  352.0437
## 19.2010        B    E 2010 19.8161101 1498.2641
## 20.2010        B    D 2010 28.4359775 1492.3291
## 21.2010        A    R 2010 23.0095938  364.6818
## 22.2010        A <NA> 2010 20.6857922  363.6210
## 23.2010        A    Q 2010 26.9920258  364.8050
## 24.2010        A    N 2010 25.0578899  356.2132
## 25.2010        B    O 2010 22.1752929 1505.5753
## 26.2010        A    B 2010 19.0098126  355.1049
## 27.2010        A    Z 2010 24.5348854  351.4599
## 28.2010        A    L 2010 29.2015909  354.3519
## 29.2010        A <NA> 2010 24.4550315  358.4514
## 30.2010        A    S 2010 23.1435648  365.8588

对于规范化要求,如果您将上述结果捕获为long,则可以使用以下内容,但它不会获得ID = NA的行(您将不得不填写缺少有意义的数据键值,并完全重新排序框架:

transform(merge(long,na.omit(subset(long,year==2008)[,c('ID','currency','rem','treat')]),c('ID','currency'),all.x=T),rem=rem.x/rem.y,treat=treat.x/treat.y,rem.x=NULL,rem.y=NULL,treat.x=NULL,treat.y=NULL);
##      ID currency year        rem     treat
## 1     A        A 2010  6.9528542 0.9991664
## 2     A        A 2008  1.0000000 1.0000000
## 3     A        A 2009  2.7930218 0.9688498
## 4     B        A 2008  1.0000000 1.0000000
## 5     B        A 2009  1.0709261 1.0083424
## 6     B        A 2010  2.7036515 0.9943865
## 7     C        A 2010  4.0478419 0.9841746
## 8     C        A 2008  1.0000000 1.0000000
## 9     C        A 2009  1.0260960 0.9939242
## 10    D        B 2008  1.0000000 1.0000000
## 11    D        B 2009  0.3574640 0.9955182
## 12    D        B 2010  1.5320478 0.9915509
## 13    E        B 2008  1.0000000 1.0000000
## 14    E        B 2009  1.8496556 0.9994525
## 15    E        B 2010  4.5369285 0.9935181
## 16    F        B 2008  1.0000000 1.0000000
## 17    F        B 2009  0.7353043 1.0013095
## 18    F        B 2010  2.7425633 0.9999759
## 19    G        B 2008  1.0000000 1.0000000
## 20    G        B 2009  0.7090082 0.9950897
## 21    G        B 2010  1.9763903 0.9981883
## 22    H        A 2008  1.0000000 1.0000000
## 23    H        A 2009  0.5582864 0.9918890
## 24    H        A 2010  1.2931858 1.0076730
## 25    I        A 2008  1.0000000 1.0000000
## 26    I        A 2009  0.5337159 0.9839401
## 27    I        A 2010  1.7137695 0.9980122
## 28    J        A 2008  1.0000000 1.0000000
## 29    J        A 2010 16.5848818 0.9740304
## 30    J        A 2009  7.5182186 0.9655361
## 31    K        B 2008  1.0000000 1.0000000
## 32    K        B 2009  1.4521080 1.0061808
## 33    K        B 2010  3.2925752 0.9983682
## 34    L        A 2008  1.0000000 1.0000000
## 35    L        A 2009  1.2891531 1.0419753
## 36    L        A 2010  3.7328917 1.0038831
## 37    M        A 2008  1.0000000 1.0000000
## 38    M        A 2009  1.1233273 1.0007651
## 39    M        A 2010  2.2260077 0.9997607
## 40    N        A 2008  1.0000000 1.0000000
## 41    N        A 2009  0.4245635 0.9649144
## 42    N        A 2010  1.5395747 0.9818937
## 43    O        B 2008  1.0000000 1.0000000
## 44    O        B 2009  0.9837893 1.0128328
## 45    O        B 2010  2.4913044 1.0115763
## 46    P        A 2008  1.0000000 1.0000000
## 47    P        A 2009 11.6240089 1.0164859
## 48    P        A 2010 34.2534058 1.0098093
## 49    Q        A 2008  1.0000000 1.0000000
## 50    Q        A 2009  3.0734347 1.0006686
## 51    Q        A 2010  9.6915843 1.0346968
## 52    R        A 2008  1.0000000 1.0000000
## 53    R        A 2009  0.8224353 0.9892627
## 54    R        A 2010  2.3758540 1.0215027
## 55    S        A 2008  1.0000000 1.0000000
## 56    S        A 2009  0.6614779 0.9976596
## 57    S        A 2010  1.3920566 1.0295179
## 58    T        B 2008  1.0000000 1.0000000
## 59    T        B 2009  0.9740528 0.9982222
## 60    T        B 2010  1.7325861 1.0016354
## 61    U        A 2009  0.4628874 0.9657975
## 62    U        A 2008  1.0000000 1.0000000
## 63    U        A 2010  1.2270502 0.9707642
## 64    V        A 2008  1.0000000 1.0000000
## 65    V        A 2009  0.6252435 1.0043951
## 66    V        A 2010  3.2360945 1.0020838
## 67    W        A 2008  1.0000000 1.0000000
## 68    W        A 2009  1.1445619 1.0058508
## 69    W        A 2010  2.0316137 0.9831111
## 70    X        A 2008  1.0000000 1.0000000
## 71    X        A 2009  0.6915649 1.0117269
## 72    X        A 2010  1.6858770 1.0206874
## 73    Y        A 2009  1.2700368 1.0112218
## 74    Y        A 2008  1.0000000 1.0000000
## 75    Y        A 2010  4.0713559 0.9952919
## 76    Z        A 2008  1.0000000 1.0000000
## 77    Z        A 2009  0.5603965 1.0029960
## 78    Z        A 2010  1.8311774 0.9745101
## 79 <NA>        A 2008         NA        NA
## 80 <NA>        A 2008         NA        NA
## 81 <NA>        A 2008         NA        NA
## 82 <NA>        A 2008         NA        NA
## 83 <NA>        A 2009         NA        NA
## 84 <NA>        A 2009         NA        NA
## 85 <NA>        A 2009         NA        NA
## 86 <NA>        A 2009         NA        NA
## 87 <NA>        A 2010         NA        NA
## 88 <NA>        A 2010         NA        NA
## 89 <NA>        A 2010         NA        NA
## 90 <NA>        A 2010         NA        NA