我有两个不同年份的数据集,我正在尝试使用
append
命令。两者都包含多个变量和一个人
识别变量,但是不相同。
数据集1:
PID Year Car Sex Age
420201 2016 0 Female 70
420202 2016 0 Male 87
420204 2016 0 Female 62
420205 2016 1 Female 34
420207 2016 1 Male 48
数据集2:
PID Year Car Sex Age
420202 2014 1 Male 59
420204 2014 0 Female 76
420205 2014 1 Male 37
420207 2014 1 Male 23
问题在于,当我尝试附加这些数据集时,Stata会生成一个数据集 来自一个数据集的某些标识符的值不正确的地方 赋予其他数据集的标识符。
附加数据集:
PID Year Car Sex Age
420201 2016 0 Female 70
420201 2014 1 Male 59
420202 2016 0 Male 87
420202 2014 0 Female 76
420204 2016 0 Female 62
420204 2014 1 Male 37
420205 2016 1 Female 34
420205 2014 1 Male 23
420207 2016 1 Male 48
420207 2014 1 Male 23
是否有此解决方法?
答案 0 :(得分:9)
我之前遇到过这个问题,之所以会发生是因为Equatable
您实际上看到的是标签附加在值PID
上。
为说明这一点,请考虑以下示例:
1, 2, 3, 4, 5
因此,当您尝试clear
input PID Year Car str6 Sex Age
1 2016 0 Female 70
2 2016 0 Male 87
3 2016 0 Female 62
4 2016 1 Female 34
5 2016 1 Male 48
end
label define PID 1 "420201" 2 "420202" 3 "420204" 4 "420205" 5 "420207"
label values PID PID
list
+------------------------------------+
| PID Year Car Sex Age |
|------------------------------------|
1. | 420201 2016 0 Female 70 |
2. | 420202 2016 0 Male 87 |
3. | 420204 2016 0 Female 62 |
4. | 420205 2016 1 Female 34 |
5. | 420207 2016 1 Male 48 |
+------------------------------------+
list, nolabel
+---------------------------------+
| PID Year Car Sex Age |
|---------------------------------|
1. | 1 2016 0 Female 70 |
2. | 2 2016 0 Male 87 |
3. | 3 2016 0 Female 62 |
4. | 4 2016 1 Female 34 |
5. | 5 2016 1 Male 48 |
+---------------------------------+
时,会发生以下情况:
append
您的值标签的定义可能有所不同,但是想法是相同的。
为了能够正确clear
input PID Year Car str6 Sex Age
1 2014 1 Male 59
2 2014 0 Female 76
3 2014 1 Male 37
4 2014 1 Male 23
end
label define PID 1 "420202" 2 "420204" 3 "420205" 4 "420207"
label values PID PID
save data2, replace
append using data1
sort PID
list
+------------------------------------+
| PID Year Car Sex Age |
|------------------------------------|
1. | 420202 2014 1 Male 59 |
2. | 420202 2016 0 Female 70 |
3. | 420204 2016 0 Male 87 |
4. | 420204 2014 0 Female 76 |
5. | 420205 2014 1 Male 37 |
|------------------------------------|
6. | 420205 2016 0 Female 62 |
7. | 420207 2016 1 Female 34 |
8. | 420207 2014 1 Male 23 |
9. | 5 2016 1 Male 48 |
+------------------------------------+
两个数据集,您首先需要
将append
转换为字符串:
PID
您可能还希望使用foreach dta in data1 data2 {
use `dta', clear
decode PID, generate(PID2)
drop PID
rename PID2 PID
save `dta', replace
}
append using data1
order PID
sort PID
list
+------------------------------------+
| PID Year Car Sex Age |
|------------------------------------|
1. | 420201 2016 0 Female 70 |
2. | 420202 2016 0 Male 87 |
3. | 420202 2014 1 Male 59 |
4. | 420204 2016 0 Female 62 |
5. | 420204 2014 0 Female 76 |
|------------------------------------|
6. | 420205 2016 1 Female 34 |
7. | 420205 2014 1 Male 37 |
8. | 420207 2016 1 Male 48 |
9. | 420207 2014 1 Male 23 |
+------------------------------------+
命令将新的字符串destring
变量转换为数字变量。