Question

我正在尝试使用另一个数据集中的值来更新数据集中的某些缺失值。

以下是Stata 14.2中的示例：

sysuse auto, clear   

// save in order to merge below
save auto, replace 

// create some missing to update
replace length = . if length < 175

// just so the two datasets are not exactly the same, which is my real example
drop if _n == _N

merge 1:1 make using auto, nogen keep(master match_update) update

上方的代码仅使观测值保持更新（26观测值）。如果改用keep(match_update)，则结果完全相同。

为什么Stata不能将所有观测保留在主数据集中？

请注意，不使用match_update也无济于事，因为它会删除所有观察值。

我当前的解决方法是重命名原始变量，合并所有变量，然后在缺少原始变量时进行替换。但是，这无法使用update选项，并且更新许多变量也很麻烦。

Answer 1

就个人而言，我总是更喜欢使用drop变量手动进行keep / _merge观察，因为它更透明且更不易出错。

但是，以下内容可以满足您的需求：

merge 1:1 make using auto, nogenerate keep(master match match_update) update

Result                           # of obs.
-----------------------------------------
not matched                             0

matched                                73
    not updated                        47  
    missing updated                    26  
    nonmissing conflict                 0  
-----------------------------------------

您可以确认情况是否如此，

sysuse auto, clear   
save auto, replace

replace length = . if length < 175
drop if _n == _N

merge 1:1 make using auto, update 

drop if _merge == 2
drop _merge
save m1

sysuse auto, clear   
save auto, replace

replace length = . if length < 175 
drop if _n == _N 

merge 1:1 make using auto, nogen keep(master match match_update) update 
save m2

cf _all using m1

display r(Nsum)
0

为什么与更新合并不能按预期工作？

1 个答案: