我有以下数据:
id test1 test1_date test2 test2_date
1 2 Jun 23, 2014 21:29 26 Jun 20, 2014 06:27
1 2 Jun 24, 2014 01:44 25 Jun 21, 2014 02:53
1 2 Jun 24, 2014 06:20 25 Jun 22, 2014 07:38
2 2 Jun 25, 2014 22:15 30 Jun 26, 2014 11:08
2 0 Jun 26, 2014 02:35 25 Jun 27, 2014 20:09
2 2 Jun 26, 2014 06:49 25 Jun 30, 2014 14:47
这是所谓的宽格式。我想将其转换为长格式,如下所示:
id date test value
1 Jun 20, 2014 06:27 2 26
1 Jun 21, 2014 02:53 2 25
1 Jun 22, 2014 07:38 2 25
1 Jun 23, 2014 21:29 1 2
1 Jun 24, 2014 01:44 1 2
1 Jun 24, 2014 06:20 1 2
2 Jun 25, 2014 22:15 1 2
2 Jun 26, 2014 02:35 1 0
2 Jun 26, 2014 06:49 1 2
2 Jun 26, 2014 11:08 2 30
2 Jun 27, 2014 20:09 2 25
2 Jun 30, 2014 14:47 2 25
我尝试了reshape
命令:
reshape test1 test2, i(id)
但是,它会创建一个缺失值向量。 另一个尝试是
reshape long test1 test2 , i(id test1_date test2_date)
答案 0 :(得分:2)
它的个人品味,但我建议反对术语"格式"这里。它已经过载(显示格式,文件格式)。我建议只是" shape"。
这个问题是可以解决的,但是你需要两个小技巧来扭转一个误解:
Stata想要另一个标识符变量,因为它的基本思想是reshape
应该是可逆的。因此,即使它是任意的,也需要创建。
变量名称将受益于某些工作。
日期变量不能用于识别观察组,因为它们的值是不同的,不会重复。
this FAQ中有通用建议(除了帮助和手册输入之外)。
只需要将一些语法设置为可重现的代码。一个好问题会为我们做到这一点!
. clear
. input id test1 str18 Stest1_date test2 str18 Stest2_date
id test1 Stest1_date test2 Stest2_date
1. 1 2 "Jun 23, 2014 21:29" 26 "Jun 20, 2014 06:27"
2. 1 2 "Jun 24, 2014 01:44" 25 "Jun 21, 2014 02:53"
3. 1 2 "Jun 24, 2014 06:20" 25 "Jun 22, 2014 07:38"
4. 2 2 "Jun 25, 2014 22:15" 30 "Jun 26, 2014 11:08"
5. 2 0 "Jun 26, 2014 02:35" 25 "Jun 27, 2014 20:09"
6. 2 2 "Jun 26, 2014 06:49" 25 "Jun 30, 2014 14:47"
7. end
.
. gen double test1_date = clock(Stest1_date, "MDY hm")
. gen double test2_date = clock(Stest2_date, "MDY hm")
. drop S*
. format t*date %tc
. l, sepby(id)
+--------------------------------------------------------------+
| id test1 test2 test1_date test2_date |
|--------------------------------------------------------------|
1. | 1 2 26 23jun2014 21:29:00 20jun2014 06:27:00 |
2. | 1 2 25 24jun2014 01:44:00 21jun2014 02:53:00 |
3. | 1 2 25 24jun2014 06:20:00 22jun2014 07:38:00 |
|--------------------------------------------------------------|
4. | 2 2 30 25jun2014 22:15:00 26jun2014 11:08:00 |
5. | 2 0 25 26jun2014 02:35:00 27jun2014 20:09:00 |
6. | 2 2 25 26jun2014 06:49:00 30jun2014 14:47:00 |
+--------------------------------------------------------------+
.
. bysort id : gen j = _n
. rename (test1_date test2_date) (date1 date2)
. reshape long test date, i(id j)
(note: j = 1 2)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 6 -> 12
Number of variables 6 -> 5
j variable (2 values) -> _j
xij variables:
test1 test2 -> test
date1 date2 -> date
-----------------------------------------------------------------------------
. l, sepby(id)
+-----------------------------------------+
| id j _j test date |
|-----------------------------------------|
1. | 1 1 1 2 23jun2014 21:29:00 |
2. | 1 1 2 26 20jun2014 06:27:00 |
3. | 1 2 1 2 24jun2014 01:44:00 |
4. | 1 2 2 25 21jun2014 02:53:00 |
5. | 1 3 1 2 24jun2014 06:20:00 |
6. | 1 3 2 25 22jun2014 07:38:00 |
|-----------------------------------------|
7. | 2 1 1 2 25jun2014 22:15:00 |
8. | 2 1 2 30 26jun2014 11:08:00 |
9. | 2 2 1 0 26jun2014 02:35:00 |
10. | 2 2 2 25 27jun2014 20:09:00 |
11. | 2 3 1 2 26jun2014 06:49:00 |
12. | 2 3 2 25 30jun2014 14:47:00 |
+-----------------------------------------+