winsorize data.table中的所有变量

时间:2017-02-03 02:31:38

标签: r data.table

我的数据如下:

     gvkey   datadate fyear     cusip curcd      at    ceq  csho   dltt   dvc   nopi  oibdp
 1:  1001 12/31/1981  1981 000165100   USD      NA     NA    NA     NA    NA     NA     NA
 2:  1001 12/31/1982  1982 000165100   USD      NA     NA    NA     NA    NA     NA     NA
 3:  1001 12/31/1983  1983 000165100   USD  14.080  7.823 3.568  4.344 0.000  0.640  2.650
 4:  1001 12/31/1984  1984 000165100   USD  16.267  8.962 3.568  4.181 0.000  0.575  3.208
 5:  1001 12/31/1985  1985 000165100   USD  39.495 13.014 3.988 11.908 0.000  0.623  7.247
 6:  1003 12/31/1981  1981 000354100   USD      NA     NA    NA     NA    NA     NA     NA
 7:  1003 12/31/1982  1982 000354100   USD   5.632  1.983 2.100  1.200 0.000  0.000  1.906
 8:  1003 12/31/1983  1983 000354100   USD   8.529  6.095 2.683  0.950 0.000  0.000  2.138
 9:  1003 12/31/1984  1984 000354100   USD   8.241  6.482 2.683  0.600 0.000  0.000  0.825
10:  1003 01/31/1986  1985 000354100   USD  13.990  6.665 2.683  4.682 0.000  0.000  1.037
11:  1003 01/31/1987  1986 000354100   USD  14.586  7.458 2.683  3.750 0.000  0.000  2.462
12:  1003 01/31/1988  1987 000354100   USD  16.042  7.643 2.683  5.478 0.000  0.000  0.111
13:  1003 01/31/1989  1988 000354100   USD  16.280 -0.194 2.683  0.104 0.000  0.000 -3.680
14:  1003 01/31/1990  1989 000354100   USD  10.109 -0.416 2.683  0.076 0.000  0.000 -1.532
15:  1004 05/31/1981  1980 000361105   USD  83.075 29.721 2.566 23.400 1.019  0.464 11.058
16:  1004 05/31/1982  1981 000361105   USD 113.653 42.423 3.942 29.412 1.582 -0.088 12.652
17:  1004 05/31/1983  1982 000361105   USD 111.288 43.225 3.932 23.504 1.727  0.000 13.174
18:  1004 05/31/1984  1983 000361105   USD 137.228 81.085 6.007 13.040 1.965  0.522 15.208
19:  1004 05/31/1985  1984 000361105   USD 155.405 87.385 6.036 16.415 2.893  0.387 21.398
20:  1004 05/31/1986  1985 000361105   USD 198.287 95.381 9.099 25.022 3.700  1.132 27.282
    prstkc pstkrv prcc_c
 1:     NA     NA     NA
 2:     NA     NA     NA
 3:  0.000    0.0  7.250
 4:  0.000    0.0  3.750
 5:  0.009    0.0 10.125
 6:     NA     NA     NA
 7:  0.000    1.2     NA
 8:  1.200    0.0  5.250
 9:  0.000    0.0  2.750
10:  0.000    0.0  4.375
11:  0.000    0.0  4.250
12:  0.000    0.0  2.750
13:  0.000    0.0  1.750
14:  0.000    0.0  0.750
15:  0.000    0.0 13.625
16:  0.518    0.0  7.625
17:  0.211    0.0  9.375
18:  0.003    0.0 17.250
19:  0.000    0.0 18.500
20:  0.000    0.0 25.875

除了识别变量(gvkey,datadate,fyear,cusip和curcd)之外,我想在所有变量的5%和95%上进行winorize。具体来说,我想删除所有具有至少一个数值变量值的观察值,这些值包含在范围(0,5%)或(95%,100%)百分位数内。我想在data.table环境中这样做。

0 个答案:

没有答案