如何使用bysort编写新的字符串变量

时间:2018-12-15 08:32:14

标签: stata

我要准备Stata代码,以根据小于或等于60的对应分数以及每个suspect中的缺失分数,从标识的urid中生成新的字符串变量作为said

输入:

primkey  ssuid  sup  urid  score
10312551  1255  601  122   60
10312552  1255  601  122   80
10312553  1255  601  123   90
10312554  1255  601  124   66
10312561  1256  601  122   40
10312562  1256  601  123   30
10312563  1256  601  124   .
10312564  1256  601  125   66
10312581  1258  602  126   80
10312582  1258  602  127   95
10312583  1258  602  127   100
10312584  1258  602  128   .

输出:

ssuid  sup  suspect
1255   601  122
1256   601  122,123,124
1258   602  128

输出中不需要变量primkeyscore字段。

以下是我已经尝试过的代码:

sort state ssuid urid sup
gen x=_n
gen suspect=.
replace suspect=urid if (score <=60 | score==.) & urid=urid[x-1]
drop x
sort state ssuid suspect
gen x=_n
tostring suspect, replace
replace suspect=suspect[x-1]+","+suspect if suspect!="." & ssuid==ssuid[x-1]
drop x
gen x=strlen(ssususpect_sup)
gsort state ssuid -x
drop x
gen x=_n
replace suspect=suspect_sup[x-1] if ssuid==ssuid[x-1]
drop x

bys ssuid:gen x=_n 
keep if x==1
drop x

但是,这不能产生预期的结果。

1 个答案:

答案 0 :(得分:1)

您的示例并不完整。

对于ssuid 1255,最低分数是60。因此,分数不低于60。

对于ssuid 1258,最低分是80。

Stata的规则是,失踪人数任意大都为正数,因此不得少于60。

您的规则似乎小于或等于60或丢失。

有了这些修复,这对我有用:

clear 
input primkey  ssuid  sup  urid  score
10312551  1255  601  122   60
10312552  1255  601  122   80
10312553  1255  601  123   90
10312554  1255  601  124   66
10312561  1256  601  122   40
10312562  1256  601  123   30
10312563  1256  601  124   .
10312564  1256  601  125   66
10312581  1258  602  126   80
10312582  1258  602  127   95
10312583  1258  602  127   100
10312584  1258  602  128   .
end 

egen tokeep = max(score <= 60 | missing(score)), by(ssuid) 
keep if tokeep 
drop tokeep 
drop if inrange(score, 61, .) 

bysort ssuid (primkey) : gen suspect = string(urid) if _n == 1 
by ssuid: replace suspect = suspect[_n-1] + ///
"," + string(urid) if _n > 1 & urid != urid[_n-1] 
by ssuid: keep if _n == _N 

keep ssuid sup suspect 
list 

     +---------------------------+
     | ssuid   sup       suspect |
     |---------------------------|
  1. |  1255   601           122 |
  2. |  1256   601   122,123,124 |
  3. |  1258   602           128 |
     +---------------------------+

EDIT让我们看一下原始代码。我已经发表了一些评论,并发现了一些可以简化的代码和一个明显的错误,但是在此之后我获得了保释。问题必须是可复制的!

sort state ssuid urid sup
gen x=_n
gen suspect=.
replace suspect=urid if (score <=60 | score==.) & urid=urid[x-1]
drop x
sort state ssuid suspect
gen x=_n
tostring suspect, replace
replace suspect=suspect[x-1]+","+suspect if suspect!="." & ssuid==ssuid[x-1]
drop x
gen x=strlen(ssususpect_sup)
gsort state ssuid -x
drop x
gen x=_n
replace suspect=suspect_sup[x-1] if ssuid==ssuid[x-1]
drop x
bys ssuid:gen x=_n 
keep if x==1
drop x

*1 There is nothing in the data example about a variable -state-. 
* What do you don't show, we need not try to replicate 

*2 Your use of x for _n isn't needed. You can use _n directly. 
* That saves 8 lines 

sort ssuid urid sup
gen suspect=.
replace suspect=urid if (score <=60 | score==.) & urid=urid[_n-1]
sort ssuid suspect
tostring suspect, replace
replace suspect=suspect[_n-1]+","+suspect if suspect!="." & ssuid==ssuid[_n-1]
gen x=strlen(ssususpect_sup)
gsort ssuid -x
drop x
replace suspect=suspect_sup[_n-1] if ssuid==ssuid[_n-1]
bys ssuid: keep if _n == 1 

*3 Your first three lines can be slimmed to one 

by sort ssuid urid (sup) : gen suspect = urid if (score <=60 | score==.) & urid=urid[_n-1]
sort ssuid suspect
tostring suspect, replace
replace suspect=suspect[_n-1]+","+suspect if suspect!="." & ssuid==ssuid[_n-1]
gen x=strlen(ssususpect_sup)
gsort ssuid -x
drop x
replace suspect=suspect_sup[_n-1] if ssuid==ssuid[_n-1]
bys ssuid: keep if _n == 1 

*4 It's simpler to create -suspect- as string in the first place
* so cut the -tostring- line  

bysort ssuid urid (sup) : gen suspect = string(urid) if (score <=60 | score==.) & urid=urid[_n-1]
sort ssuid suspect
replace suspect=suspect[_n-1]+","+suspect if suspect!="." & ssuid==ssuid[_n-1]
gen x=strlen(ssususpect_sup)
gsort ssuid -x
drop x
replace suspect=suspect_sup[_n-1] if ssuid==ssuid[_n-1]
bys ssuid: keep if _n == 1 

*5 bug: testing for equality needs == not = 

bysort ssuid urid (sup) : gen suspect = string(urid) if (score <=60 | score==.) & urid==urid[_n-1]
sort ssuid suspect
replace suspect=suspect[_n-1]+","+suspect if suspect!="." & ssuid==ssuid[_n-1]
gen x=strlen(ssususpect_sup)
gsort ssuid -x
drop x
replace suspect=suspect_sup[_n-1] if ssuid==ssuid[_n-1]
bys ssuid: keep if _n == 1 

*6 you refer to a variable -ssussusspect_sup- which you haven't supplied as 
* data, or created in your code. At this point I bail out.