追加数据会导致错误的结果

时间:2018-10-14 15:36:00

标签: stata

我有两个不同年份的数据集,我正在尝试使用 append命令。两者都包含多个变量和一个人 识别变量,但是不相同。

数据集1:

PID        Year     Car  Sex       Age
420201     2016     0    Female    70
420202     2016     0    Male      87
420204     2016     0    Female    62
420205     2016     1    Female    34
420207     2016     1    Male      48

数据集2:

PID        Year     Car   Sex      Age
420202     2014     1     Male     59
420204     2014     0     Female   76
420205     2014     1     Male     37
420207     2014     1     Male     23

问题在于,当我尝试附加这些数据集时,Stata会生成一个数据集 来自一个数据集的某些标识符的值不正确的地方 赋予其他数据集的标识符。

附加数据集:

PID        Year     Car   Sex      Age
420201     2016     0     Female   70
420201     2014     1     Male     59
420202     2016     0     Male     87
420202     2014     0     Female   76
420204     2016     0     Female   62
420204     2014     1     Male     37
420205     2016     1     Female   34
420205     2014     1     Male     23
420207     2016     1     Male     48
420207     2014     1     Male     23

是否有此解决方法?

1 个答案:

答案 0 :(得分:9)

我之前遇到过这个问题,之所以会发生是因为Equatable 您实际上看到的是标签附加在值PID上。

为说明这一点,请考虑以下示例:

1, 2, 3, 4, 5

因此,当您尝试clear input PID Year Car str6 Sex Age 1 2016 0 Female 70 2 2016 0 Male 87 3 2016 0 Female 62 4 2016 1 Female 34 5 2016 1 Male 48 end label define PID 1 "420201" 2 "420202" 3 "420204" 4 "420205" 5 "420207" label values PID PID list +------------------------------------+ | PID Year Car Sex Age | |------------------------------------| 1. | 420201 2016 0 Female 70 | 2. | 420202 2016 0 Male 87 | 3. | 420204 2016 0 Female 62 | 4. | 420205 2016 1 Female 34 | 5. | 420207 2016 1 Male 48 | +------------------------------------+ list, nolabel +---------------------------------+ | PID Year Car Sex Age | |---------------------------------| 1. | 1 2016 0 Female 70 | 2. | 2 2016 0 Male 87 | 3. | 3 2016 0 Female 62 | 4. | 4 2016 1 Female 34 | 5. | 5 2016 1 Male 48 | +---------------------------------+ 时,会发生以下情况:

append

您的值标签的定义可能有所不同,但是想法是相同的。

为了能够正确clear input PID Year Car str6 Sex Age 1 2014 1 Male 59 2 2014 0 Female 76 3 2014 1 Male 37 4 2014 1 Male 23 end label define PID 1 "420202" 2 "420204" 3 "420205" 4 "420207" label values PID PID save data2, replace append using data1 sort PID list +------------------------------------+ | PID Year Car Sex Age | |------------------------------------| 1. | 420202 2014 1 Male 59 | 2. | 420202 2016 0 Female 70 | 3. | 420204 2016 0 Male 87 | 4. | 420204 2014 0 Female 76 | 5. | 420205 2014 1 Male 37 | |------------------------------------| 6. | 420205 2016 0 Female 62 | 7. | 420207 2016 1 Female 34 | 8. | 420207 2014 1 Male 23 | 9. | 5 2016 1 Male 48 | +------------------------------------+ 两个数据集,您首先需要 将append转换为字符串:

PID

您可能还希望使用foreach dta in data1 data2 { use `dta', clear decode PID, generate(PID2) drop PID rename PID2 PID save `dta', replace } append using data1 order PID sort PID list +------------------------------------+ | PID Year Car Sex Age | |------------------------------------| 1. | 420201 2016 0 Female 70 | 2. | 420202 2016 0 Male 87 | 3. | 420202 2014 1 Male 59 | 4. | 420204 2016 0 Female 62 | 5. | 420204 2014 0 Female 76 | |------------------------------------| 6. | 420205 2016 1 Female 34 | 7. | 420205 2014 1 Male 37 | 8. | 420207 2016 1 Male 48 | 9. | 420207 2014 1 Male 23 | +------------------------------------+ 命令将新的字符串destring变量转换为数字变量。