以更紧凑的格式转换数据集(Stata)

时间:2017-05-22 20:24:01

标签: database stata

最初我正在处理一个看起来像这样的数据集:



+------+--------+-----------+-------+
| date |  geo   | variables | value |
+------+--------+-----------+-------+
| 1981 | Canada | var1      | #     |
| 1982 | Canada | var1      | #     |
| 1983 | Canada | var1      | #     |
|  ... | ...    | ...       | ...   |
| 2015 | Canada | var2      | #     |
| 1981 | Canada | var2      | #     |
| 1982 | Canada | var2      | #     |
|  ... | ...    | ...       | ...   |
| 2015 | Canada | var2      | #     |
| 1981 | Quebec | var1      | #     |
| 1982 | Quebec | var1      | #     |
| 1983 | Quebec | var1      | #     |
|  ... | ...    | ...       | ...   |
| 2015 | Quebec | var2      | #     |
| 1981 | Quebec | var2      | #     |
| 1982 | Quebec | var2      | #     |
|  ... | ...    | ...       | ...   |
| 2015 | Quebec | var2      | #     |
+------+--------+-----------+-------+




所以我有35个时间段,两个国家和两个变量。我想改造Stata中的表格,看起来像这样:



+------+--------+------+------+
| date |  geo   | var1 | var2 |
+------+--------+------+------+
| 1981 | Canada | #    | #    |
| 1982 | Canada | #    | #    |
|  ... | ...    | ...  | ...  |
| 2015 | Canada | #    | #    |
| 1981 | Quebec | #    | #    |
| 1982 | Quebec | #    | #    |
|  ... | ...    | ...  | ...  |
| 2015 | Quebec | #    | #    |
+------+--------+------+------+




但是,我没有取得多大成功。我尝试用命令将不同的观察结果分成变量:

separate value, by(variables) generate(var)

创造了这样的东西:



+------+--------+------+------+
| date |  geo   | var1 | var2 |
+------+--------+------+------+
| 1981 | Canada | #    | .    |
| 1982 | Canada | #    | .    |
|  ... | ...    | ...  | ...  |
| 2015 | Canada | #    | .    |
| 1981 | Canada | .    | #    |
| 1982 | Canada | .    | #    |
|  ... | ...    | ...  | ...  |
| 2015 | Canada | .    | #    |
| 1981 | Quebec | #    | .    |
| 1982 | Quebec | #    | .    |
|  ... | ...    | ...  | ...  |
| 2015 | Quebec | #    | .    |
| 1981 | Quebec | .    | #    |
| 1982 | Quebec | .    | #    |
|  ... | ...    | ...  | ...  |
| 2015 | Quebec | .    | #    |
+------+--------+------+------+




其中包含许多无用的缺失值。

所以,更具体地说,我想直接将表格A带到B(即不使用separate)或将表C修复为B的解决方案。

非常感谢。

1 个答案:

答案 0 :(得分:3)

如果没有样本数据,我的答案将不得不进行测试。我认为以下内容将使您开始朝着正确的方向前进。

reshape wide value, i(date geo) j(variables) string

请注意,这假定variables变量的内容适合用作变量名。例如,变量的值1potato将是一个问题。

无论如何,

help reshape

应该是你的第一站。

在回复评论时添加:以下是我编写的一些数据以及reshape适用于此数据的演示。也许您可以解释这些数据与实际数据的不同之处。您的错误消息表明,对于某些日期和地理位置的组合,变量的特定值会多次出现。

. list, sepby(geo)

     +----------------------------------+
     | date      geo   variab~s   value |
     |----------------------------------|
  1. | 1981   Canada       var1     111 |
  2. | 1982   Canada       var1     211 |
  3. | 1983   Canada       var1     311 |
  4. | 1981   Canada       var2     112 |
  5. | 1982   Canada       var2     212 |
  6. | 1983   Canada       var2     312 |
     |----------------------------------|
  7. | 1981   Quebec       var1     121 |
  8. | 1982   Quebec       var1     221 |
  9. | 1983   Quebec       var1     321 |
 10. | 1981   Quebec       var2     122 |
 11. | 1982   Quebec       var2     222 |
 12. | 1983   Quebec       var2     322 |
     +----------------------------------+

. reshape wide value, i(geo date) j(variables) string
(note: j = var1 var2)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       12   ->       6
Number of variables                   4   ->       4
j variable (2 values)         variables   ->   (dropped)
xij variables:
                                  value   ->   valuevar1 valuevar2
-----------------------------------------------------------------------------

. rename (value*) (*)

. list, sepby(geo)

     +-----------------------------+
     | date      geo   var1   var2 |
     |-----------------------------|
  1. | 1981   Canada    111    112 |
  2. | 1982   Canada    211    212 |
  3. | 1983   Canada    311    312 |
     |-----------------------------|
  4. | 1981   Quebec    121    122 |
  5. | 1982   Quebec    221    222 |
  6. | 1983   Quebec    321    322 |
     +-----------------------------+

.