最初我正在处理一个看起来像这样的数据集:
+------+--------+-----------+-------+
| date | geo | variables | value |
+------+--------+-----------+-------+
| 1981 | Canada | var1 | # |
| 1982 | Canada | var1 | # |
| 1983 | Canada | var1 | # |
| ... | ... | ... | ... |
| 2015 | Canada | var2 | # |
| 1981 | Canada | var2 | # |
| 1982 | Canada | var2 | # |
| ... | ... | ... | ... |
| 2015 | Canada | var2 | # |
| 1981 | Quebec | var1 | # |
| 1982 | Quebec | var1 | # |
| 1983 | Quebec | var1 | # |
| ... | ... | ... | ... |
| 2015 | Quebec | var2 | # |
| 1981 | Quebec | var2 | # |
| 1982 | Quebec | var2 | # |
| ... | ... | ... | ... |
| 2015 | Quebec | var2 | # |
+------+--------+-----------+-------+

所以我有35个时间段,两个国家和两个变量。我想改造Stata中的表格,看起来像这样:
+------+--------+------+------+
| date | geo | var1 | var2 |
+------+--------+------+------+
| 1981 | Canada | # | # |
| 1982 | Canada | # | # |
| ... | ... | ... | ... |
| 2015 | Canada | # | # |
| 1981 | Quebec | # | # |
| 1982 | Quebec | # | # |
| ... | ... | ... | ... |
| 2015 | Quebec | # | # |
+------+--------+------+------+

但是,我没有取得多大成功。我尝试用命令将不同的观察结果分成变量:
separate value, by(variables) generate(var)
创造了这样的东西:
+------+--------+------+------+
| date | geo | var1 | var2 |
+------+--------+------+------+
| 1981 | Canada | # | . |
| 1982 | Canada | # | . |
| ... | ... | ... | ... |
| 2015 | Canada | # | . |
| 1981 | Canada | . | # |
| 1982 | Canada | . | # |
| ... | ... | ... | ... |
| 2015 | Canada | . | # |
| 1981 | Quebec | # | . |
| 1982 | Quebec | # | . |
| ... | ... | ... | ... |
| 2015 | Quebec | # | . |
| 1981 | Quebec | . | # |
| 1982 | Quebec | . | # |
| ... | ... | ... | ... |
| 2015 | Quebec | . | # |
+------+--------+------+------+

其中包含许多无用的缺失值。
所以,更具体地说,我想直接将表格A带到B(即不使用separate
)或将表C修复为B的解决方案。
非常感谢。
答案 0 :(得分:3)
如果没有样本数据,我的答案将不得不进行测试。我认为以下内容将使您开始朝着正确的方向前进。
reshape wide value, i(date geo) j(variables) string
请注意,这假定variables
变量的内容适合用作变量名。例如,变量的值1potato
将是一个问题。
无论如何,
help reshape
应该是你的第一站。
在回复评论时添加:以下是我编写的一些数据以及reshape
适用于此数据的演示。也许您可以解释这些数据与实际数据的不同之处。您的错误消息表明,对于某些日期和地理位置的组合,变量的特定值会多次出现。
. list, sepby(geo)
+----------------------------------+
| date geo variab~s value |
|----------------------------------|
1. | 1981 Canada var1 111 |
2. | 1982 Canada var1 211 |
3. | 1983 Canada var1 311 |
4. | 1981 Canada var2 112 |
5. | 1982 Canada var2 212 |
6. | 1983 Canada var2 312 |
|----------------------------------|
7. | 1981 Quebec var1 121 |
8. | 1982 Quebec var1 221 |
9. | 1983 Quebec var1 321 |
10. | 1981 Quebec var2 122 |
11. | 1982 Quebec var2 222 |
12. | 1983 Quebec var2 322 |
+----------------------------------+
. reshape wide value, i(geo date) j(variables) string
(note: j = var1 var2)
Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 12 -> 6
Number of variables 4 -> 4
j variable (2 values) variables -> (dropped)
xij variables:
value -> valuevar1 valuevar2
-----------------------------------------------------------------------------
. rename (value*) (*)
. list, sepby(geo)
+-----------------------------+
| date geo var1 var2 |
|-----------------------------|
1. | 1981 Canada 111 112 |
2. | 1982 Canada 211 212 |
3. | 1983 Canada 311 312 |
|-----------------------------|
4. | 1981 Quebec 121 122 |
5. | 1982 Quebec 221 222 |
6. | 1983 Quebec 321 322 |
+-----------------------------+
.