按组创建条件变量,并在面板数据

时间:2017-04-30 11:00:34

标签: postgresql stata

我试图分析呼叫概率和车辆距离之间的联系。

示例数据集(here csv)如下所示:

id  day         time    called  d   
1   2009-06-24  1700    0       1037.6  
1   2009-06-24  1710    1       1191.9   
1   2009-06-24  1720    0       165.5    

真实数据集有1000万行。在(此处)10分钟的不同时间窗口中,有id代表调用与否的位置。 我想首先删除所有行,这些行具有在整个期间的任何日期此时从未调用过的相同ID。 然后我留下代表id的行,这些行在给定时间的分析期间的某一天调用。

我想创建一个变量,该变量在调用的行中具有值0和前一天(或小时,周,月,无论如何,但在这一天),同时它等于{{ 1}}和-1之后的一天等。稍后我会将该变量与+1called一起用作输入,以便在不同位置进行分析和比较

我已经找了其他已回答的问题,但没找到合适的东西。所以回答或指向一个人将不胜感激。我正在使用Stata 13,但是使用Postgres 9.3或R解决这个问题也是受欢迎的。

对于多个数据集,我需要多次重复此过程,所以理想情况下我希望尽可能自动化。

更新

Here is期望结果的示例:

distance

我添加了id day time called d newvar newvar2 1 2009-06-24 1700 0 1037.6 null 1 2009-06-24 1710 1 1191.9 0 -2 1 2009-06-24 1720 0 165.5 -1 1 2009-06-25 1700 0 526.7 null 1 2009-06-25 1710 0 342.5 1 -1 1 2009-06-25 1720 1 416.1 0 1 2009-06-26 1700 0 428.3 null 1 2009-06-26 1710 1 240.7 2 0 1 2009-06-26 1720 0 228.7 1 1 2009-06-27 1700 0 282.5 null 1 2009-06-27 1710 0 182.1 3 1 1 2009-06-27 1720 0 195.5 2 2 2009-06-24 1700 0 198.0 -1 2 2009-06-24 1710 0 157.4 null 2 2009-06-24 1720 0 234.9 null 2 2009-06-25 1700 1 247.0 0 ,因为某些位置可能会在给定的时间窗口多次调用

1 个答案:

答案 0 :(得分:2)

在寻找Stata解决方案时,最好使用dataex(来自SSC)提供数据示例。

在数据按idtime排序(并进一步按day排序)之前,很难看到问题。我没有将day变量转换为Stata数字日期,因为在构造时,字符串排序顺序与自然日期顺序匹配。

对于id time组内的每次通话,您似乎都希望与通话日相关的日期偏移量。这可以通过生成一个顺序变量来跟踪每个id time组内当前观察的索引,然后减去进行调用的观察指数来完成。

由于每个时隙可以有多个呼叫,因此必须在数据的任何给定时隙内循环调用最大呼叫数。

与您的解决方案相比,此解决方案生成的结果有一点不同:您似乎忽略了2009-06-271710id == 2的呼叫。

在下面的示例中,原始数据按id time day排序,以便让读者更好地了解正在发生的事情。

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10 day int time byte called float distance str4 newvar byte newvar2
1 "2009-06-24" 1700 0 1037.6 "null"  .
1 "2009-06-25" 1700 0  526.7 "null"  .
1 "2009-06-26" 1700 0  428.3 "null"  .
1 "2009-06-27" 1700 0  282.5 "null"  .
1 "2009-06-24" 1710 1 1191.9 "0"    -2
1 "2009-06-25" 1710 0  342.5 "1"    -1
1 "2009-06-26" 1710 1  240.7 "2"     0
1 "2009-06-27" 1710 0  182.1 "3"     1
1 "2009-06-24" 1720 0  165.5 "-1"    .
1 "2009-06-25" 1720 1  416.1 "0"     .
1 "2009-06-26" 1720 0  228.7 "1"     .
1 "2009-06-27" 1720 0  195.5 "2"     .
2 "2009-06-24" 1700 0    198 "-1"    .
2 "2009-06-25" 1700 1    247 "0"     .
2 "2009-06-26" 1700 0  188.7 "1"     .
2 "2009-06-27" 1700 0  203.5 "2"     .
2 "2009-06-24" 1710 0  157.4 "null"  .
2 "2009-06-25" 1710 0  221.3 "null"  .
2 "2009-06-26" 1710 0  283.8 "null"  .
2 "2009-06-27" 1710 1   91.7 "null"  .
2 "2009-06-24" 1720 0  234.9 "null"  .
2 "2009-06-25" 1720 0  249.6 "null"  .
2 "2009-06-26" 1720 0  279.7 "null"  .
2 "2009-06-27" 1720 0  198.2 "null"  .
3 "2009-06-24" 1700 0  156.1 "-1"    .
3 "2009-06-25" 1700 1   19.9 "0"     .
3 "2009-06-26" 1700 0  195.2 "1"     .
3 "2009-06-27" 1700 0  306.2 "2"     .
3 "2009-06-24" 1710 0  150.1 "null"  .
3 "2009-06-25" 1710 0  163.7 "null"  .
3 "2009-06-26" 1710 0  288.2 "null"  .
3 "2009-06-27" 1710 0  311.7 "null"  .
3 "2009-06-24" 1720 0  135.1 "-2"    .
3 "2009-06-25" 1720 0    186 "-1"    .
3 "2009-06-26" 1720 1  297.2 "0"     .
3 "2009-06-27" 1720 0  375.9 "1"     .
end

* order observations by date within a id time group
sort id time day
by id time: gen order = _n

* number of calls at any given time
by id time: gen call = sum(called)

* repeat enough to cover the max number of calls per time
sum call, meanonly
local n = r(max)
forvalues i = 1/`n' {
    // the index of the called observation in the id time group
    by id time: gen index = order if called & call == `i'

    // replicate the index for all observations in the id time group
    by id time: egen gindex = total(index)

    // the relative position of each obs in groups with a call
    gen wanted`i' = order - gindex if gindex > 0

    drop index gindex
}

list, sepby(id time) noobs compress

和结果

. list, sepby(id time) noobs compress

  +----------------------------------------------------------------------------------------+
  | id          day   time   cal~d   dist~e   new~r   new~2   order   call   wan~1   wan~2 |
  |----------------------------------------------------------------------------------------|
  |  1   2009-06-24   1700       0   1037.6    null       .       1      0       .       . |
  |  1   2009-06-25   1700       0    526.7    null       .       2      0       .       . |
  |  1   2009-06-26   1700       0    428.3    null       .       3      0       .       . |
  |  1   2009-06-27   1700       0    282.5    null       .       4      0       .       . |
  |----------------------------------------------------------------------------------------|
  |  1   2009-06-24   1710       1   1191.9       0      -2       1      1       0      -2 |
  |  1   2009-06-25   1710       0    342.5       1      -1       2      1       1      -1 |
  |  1   2009-06-26   1710       1    240.7       2       0       3      2       2       0 |
  |  1   2009-06-27   1710       0    182.1       3       1       4      2       3       1 |
  |----------------------------------------------------------------------------------------|
  |  1   2009-06-24   1720       0    165.5      -1       .       1      0      -1       . |
  |  1   2009-06-25   1720       1    416.1       0       .       2      1       0       . |
  |  1   2009-06-26   1720       0    228.7       1       .       3      1       1       . |
  |  1   2009-06-27   1720       0    195.5       2       .       4      1       2       . |
  |----------------------------------------------------------------------------------------|
  |  2   2009-06-24   1700       0      198      -1       .       1      0      -1       . |
  |  2   2009-06-25   1700       1      247       0       .       2      1       0       . |
  |  2   2009-06-26   1700       0    188.7       1       .       3      1       1       . |
  |  2   2009-06-27   1700       0    203.5       2       .       4      1       2       . |
  |----------------------------------------------------------------------------------------|
  |  2   2009-06-24   1710       0    157.4    null       .       1      0      -3       . |
  |  2   2009-06-25   1710       0    221.3    null       .       2      0      -2       . |
  |  2   2009-06-26   1710       0    283.8    null       .       3      0      -1       . |
  |  2   2009-06-27   1710       1     91.7    null       .       4      1       0       . |
  |----------------------------------------------------------------------------------------|
  |  2   2009-06-24   1720       0    234.9    null       .       1      0       .       . |
  |  2   2009-06-25   1720       0    249.6    null       .       2      0       .       . |
  |  2   2009-06-26   1720       0    279.7    null       .       3      0       .       . |
  |  2   2009-06-27   1720       0    198.2    null       .       4      0       .       . |
  |----------------------------------------------------------------------------------------|
  |  3   2009-06-24   1700       0    156.1      -1       .       1      0      -1       . |
  |  3   2009-06-25   1700       1     19.9       0       .       2      1       0       . |
  |  3   2009-06-26   1700       0    195.2       1       .       3      1       1       . |
  |  3   2009-06-27   1700       0    306.2       2       .       4      1       2       . |
  |----------------------------------------------------------------------------------------|
  |  3   2009-06-24   1710       0    150.1    null       .       1      0       .       . |
  |  3   2009-06-25   1710       0    163.7    null       .       2      0       .       . |
  |  3   2009-06-26   1710       0    288.2    null       .       3      0       .       . |
  |  3   2009-06-27   1710       0    311.7    null       .       4      0       .       . |
  |----------------------------------------------------------------------------------------|
  |  3   2009-06-24   1720       0    135.1      -2       .       1      0      -2       . |
  |  3   2009-06-25   1720       0      186      -1       .       2      0      -1       . |
  |  3   2009-06-26   1720       1    297.2       0       .       3      1       0       . |
  |  3   2009-06-27   1720       0    375.9       1       .       4      1       1       . |
  +----------------------------------------------------------------------------------------+