我有一个数据集,类似于下面的数据集,并且仍然希望在重新变换为long并将其用作时间序列之前修改一些内容。简化版数据:
rem2008 <- rnorm(30, 10, 5)
rem2009 <- rnorm(30, 8, 3)
rem2010 <- rnorm(30, 23, 3)
ID <- sample( LETTERS[1:30], 30)
currency <- sample( LETTERS[1:2], 30, replace= TRUE)
A2008 <- rnorm(30, 357, 5)
A2009 <- rnorm(30, 357, 5)
A2010 <- rnorm(30, 357, 5)
B2008 <- rnorm(30, 1500, 5)
B2009 <- rnorm(30, 1500, 5)
B2010 <- rnorm(30, 1500, 5)
data <- cbind(currency,ID, rem2008, rem2009, rem2010, A2008, A2009, A2010, B2008, B2009, B2010)
首先,我想“规范化”所有汇率和汇款(所有变量以:A *,B *,rem *开头),并生成一个具有相关汇率的“处理”栏:
A2009 = A2009/2008
A2010 = A2010/2008
Treat2008[currency==A] = A2008
Treat2008[currency==B] = B2008
理想情况下,融化后我想得到4列:Id,Rem,Year,Currency,Treat。
实现这一目标的最佳方法是什么?我可以在Stata中轻松地使用几个foreach循环并重新整形,但我想直接用R来完成,而不是一直导入和导出数据。我应该在重塑之前完成所有操作吗,还是有一些命令要用于长格式的操作呢?
答案 0 :(得分:1)
如果我的要求正确,我不能100%确定,请告诉我这是否是您想要的:
transform(with(list(w=c('rem','A','B')),reshape(data,dir='l',varying=lapply(w,function(pre) grep(paste0('^',pre,'\\d+$'),names(data))),v.names=w,timevar='year',times=c(2008,2009,2010))),treat=ifelse(currency=='A',A,B),A=NULL,B=NULL,id=NULL);
## currency ID year rem treat
## 1.2008 A J 2008 1.3812811 370.4443
## 2.2008 A X 2008 12.5763558 354.9506
## 3.2008 A M 2008 10.6965128 356.8754
## 4.2008 A <NA> 2008 11.3767042 351.8991
## 5.2008 A A 2008 2.9932609 365.7032
## 6.2008 B F 2008 8.8211956 1497.4942
## 7.2008 B K 2008 5.5928628 1495.0110
## 8.2008 B T 2008 10.7754368 1495.1801
## 9.2008 A <NA> 2008 3.7113704 360.5038
## 10.2008 A W 2008 11.4948061 359.6662
## 11.2008 A I 2008 13.6957023 360.6984
## 12.2008 A P 2008 0.6494516 350.5083
## 13.2008 A Y 2008 5.9079508 351.2960
## 14.2008 A C 2008 5.5337997 362.8112
## 15.2008 A V 2008 5.9769623 359.5147
## 16.2008 A H 2008 15.8466935 356.9736
## 17.2008 B G 2008 11.4490855 1501.6727
## 18.2008 A U 2008 19.8888661 362.6460
## 19.2008 B E 2008 4.3677369 1508.0391
## 20.2008 B D 2008 18.5607637 1505.0454
## 21.2008 A R 2008 9.6847677 357.0053
## 22.2008 A <NA> 2008 6.8128572 356.8416
## 23.2008 A Q 2008 2.7850994 352.5719
## 24.2008 A N 2008 16.2758518 362.7819
## 25.2008 B O 2008 8.9010772 1488.3458
## 26.2008 A B 2008 7.0311623 357.1095
## 27.2008 A Z 2008 13.3984208 360.6529
## 28.2008 A L 2008 7.8227800 352.9813
## 29.2008 A <NA> 2008 14.4264659 350.2792
## 30.2008 A S 2008 16.6254484 355.3690
## 1.2009 A J 2009 10.3847732 357.6774
## 2.2009 A X 2009 8.6973661 359.1131
## 3.2009 A M 2009 12.0156849 357.1485
## 4.2009 A <NA> 2009 1.3379483 352.8424
## 5.2009 A A 2009 8.3602428 354.3114
## 6.2009 B F 2009 6.4862632 1499.4552
## 7.2009 B K 2009 8.1214409 1504.2514
## 8.2009 B T 2009 10.4958449 1492.5220
## 9.2009 A <NA> 2009 6.5139695 364.0226
## 10.2009 A W 2009 13.1565171 361.7706
## 11.2009 A I 2009 7.3096138 354.9056
## 12.2009 A P 2009 7.5492308 356.2868
## 13.2009 A Y 2009 7.5033149 355.2381
## 14.2009 A C 2009 5.6782097 360.6068
## 15.2009 A V 2009 3.7370571 361.0948
## 16.2009 A H 2009 8.8469938 354.0782
## 17.2009 B G 2009 8.1174960 1494.2991
## 18.2009 A U 2009 9.2063056 350.2426
## 19.2009 B E 2009 8.0788092 1507.2134
## 20.2009 B D 2009 6.6348056 1498.3001
## 21.2009 A R 2009 7.9650947 353.1720
## 22.2009 A <NA> 2009 3.5795757 354.2822
## 23.2009 A Q 2009 8.5598213 352.8076
## 24.2009 A N 2009 6.9101325 350.0534
## 25.2009 B O 2009 8.7567846 1507.4454
## 26.2009 A B 2009 7.5298550 360.0886
## 27.2009 A Z 2009 7.5084281 361.7335
## 28.2009 A L 2009 10.0847608 367.7977
## 29.2009 A <NA> 2009 6.4676085 358.1437
## 30.2009 A S 2009 10.9973672 354.5373
## 1.2010 A J 2010 22.9083834 360.8241
## 2.2010 A X 2010 21.2021885 362.2936
## 3.2010 A M 2010 23.8105198 356.7900
## 4.2010 A <NA> 2010 24.1970138 367.8831
## 5.2010 A A 2010 20.8117066 365.3983
## 6.2010 B F 2010 24.1926874 1497.4581
## 7.2010 B K 2010 18.4149211 1492.5715
## 8.2010 B T 2010 18.6693716 1497.6254
## 9.2010 A <NA> 2010 27.6525893 364.0436
## 10.2010 A W 2010 23.3530057 353.5919
## 11.2010 A I 2010 23.4712772 359.9814
## 12.2010 A P 2010 22.2459282 353.9465
## 13.2010 A Y 2010 24.0533707 349.6421
## 14.2010 A C 2010 22.3999464 357.0695
## 15.2010 A V 2010 19.3420145 360.2639
## 16.2010 A H 2010 20.4927189 359.7127
## 17.2010 B G 2010 22.6278613 1498.9522
## 18.2010 A U 2010 24.4046376 352.0437
## 19.2010 B E 2010 19.8161101 1498.2641
## 20.2010 B D 2010 28.4359775 1492.3291
## 21.2010 A R 2010 23.0095938 364.6818
## 22.2010 A <NA> 2010 20.6857922 363.6210
## 23.2010 A Q 2010 26.9920258 364.8050
## 24.2010 A N 2010 25.0578899 356.2132
## 25.2010 B O 2010 22.1752929 1505.5753
## 26.2010 A B 2010 19.0098126 355.1049
## 27.2010 A Z 2010 24.5348854 351.4599
## 28.2010 A L 2010 29.2015909 354.3519
## 29.2010 A <NA> 2010 24.4550315 358.4514
## 30.2010 A S 2010 23.1435648 365.8588
对于规范化要求,如果您将上述结果捕获为long
,则可以使用以下内容,但它不会获得ID = NA的行(您将不得不填写缺少有意义的数据键值,并完全重新排序框架:
transform(merge(long,na.omit(subset(long,year==2008)[,c('ID','currency','rem','treat')]),c('ID','currency'),all.x=T),rem=rem.x/rem.y,treat=treat.x/treat.y,rem.x=NULL,rem.y=NULL,treat.x=NULL,treat.y=NULL);
## ID currency year rem treat
## 1 A A 2010 6.9528542 0.9991664
## 2 A A 2008 1.0000000 1.0000000
## 3 A A 2009 2.7930218 0.9688498
## 4 B A 2008 1.0000000 1.0000000
## 5 B A 2009 1.0709261 1.0083424
## 6 B A 2010 2.7036515 0.9943865
## 7 C A 2010 4.0478419 0.9841746
## 8 C A 2008 1.0000000 1.0000000
## 9 C A 2009 1.0260960 0.9939242
## 10 D B 2008 1.0000000 1.0000000
## 11 D B 2009 0.3574640 0.9955182
## 12 D B 2010 1.5320478 0.9915509
## 13 E B 2008 1.0000000 1.0000000
## 14 E B 2009 1.8496556 0.9994525
## 15 E B 2010 4.5369285 0.9935181
## 16 F B 2008 1.0000000 1.0000000
## 17 F B 2009 0.7353043 1.0013095
## 18 F B 2010 2.7425633 0.9999759
## 19 G B 2008 1.0000000 1.0000000
## 20 G B 2009 0.7090082 0.9950897
## 21 G B 2010 1.9763903 0.9981883
## 22 H A 2008 1.0000000 1.0000000
## 23 H A 2009 0.5582864 0.9918890
## 24 H A 2010 1.2931858 1.0076730
## 25 I A 2008 1.0000000 1.0000000
## 26 I A 2009 0.5337159 0.9839401
## 27 I A 2010 1.7137695 0.9980122
## 28 J A 2008 1.0000000 1.0000000
## 29 J A 2010 16.5848818 0.9740304
## 30 J A 2009 7.5182186 0.9655361
## 31 K B 2008 1.0000000 1.0000000
## 32 K B 2009 1.4521080 1.0061808
## 33 K B 2010 3.2925752 0.9983682
## 34 L A 2008 1.0000000 1.0000000
## 35 L A 2009 1.2891531 1.0419753
## 36 L A 2010 3.7328917 1.0038831
## 37 M A 2008 1.0000000 1.0000000
## 38 M A 2009 1.1233273 1.0007651
## 39 M A 2010 2.2260077 0.9997607
## 40 N A 2008 1.0000000 1.0000000
## 41 N A 2009 0.4245635 0.9649144
## 42 N A 2010 1.5395747 0.9818937
## 43 O B 2008 1.0000000 1.0000000
## 44 O B 2009 0.9837893 1.0128328
## 45 O B 2010 2.4913044 1.0115763
## 46 P A 2008 1.0000000 1.0000000
## 47 P A 2009 11.6240089 1.0164859
## 48 P A 2010 34.2534058 1.0098093
## 49 Q A 2008 1.0000000 1.0000000
## 50 Q A 2009 3.0734347 1.0006686
## 51 Q A 2010 9.6915843 1.0346968
## 52 R A 2008 1.0000000 1.0000000
## 53 R A 2009 0.8224353 0.9892627
## 54 R A 2010 2.3758540 1.0215027
## 55 S A 2008 1.0000000 1.0000000
## 56 S A 2009 0.6614779 0.9976596
## 57 S A 2010 1.3920566 1.0295179
## 58 T B 2008 1.0000000 1.0000000
## 59 T B 2009 0.9740528 0.9982222
## 60 T B 2010 1.7325861 1.0016354
## 61 U A 2009 0.4628874 0.9657975
## 62 U A 2008 1.0000000 1.0000000
## 63 U A 2010 1.2270502 0.9707642
## 64 V A 2008 1.0000000 1.0000000
## 65 V A 2009 0.6252435 1.0043951
## 66 V A 2010 3.2360945 1.0020838
## 67 W A 2008 1.0000000 1.0000000
## 68 W A 2009 1.1445619 1.0058508
## 69 W A 2010 2.0316137 0.9831111
## 70 X A 2008 1.0000000 1.0000000
## 71 X A 2009 0.6915649 1.0117269
## 72 X A 2010 1.6858770 1.0206874
## 73 Y A 2009 1.2700368 1.0112218
## 74 Y A 2008 1.0000000 1.0000000
## 75 Y A 2010 4.0713559 0.9952919
## 76 Z A 2008 1.0000000 1.0000000
## 77 Z A 2009 0.5603965 1.0029960
## 78 Z A 2010 1.8311774 0.9745101
## 79 <NA> A 2008 NA NA
## 80 <NA> A 2008 NA NA
## 81 <NA> A 2008 NA NA
## 82 <NA> A 2008 NA NA
## 83 <NA> A 2009 NA NA
## 84 <NA> A 2009 NA NA
## 85 <NA> A 2009 NA NA
## 86 <NA> A 2009 NA NA
## 87 <NA> A 2010 NA NA
## 88 <NA> A 2010 NA NA
## 89 <NA> A 2010 NA NA
## 90 <NA> A 2010 NA NA