我要准备Stata代码,以根据小于或等于60的对应分数以及每个suspect
中的缺失分数,从标识的urid
中生成新的字符串变量作为said
。
输入:
primkey ssuid sup urid score
10312551 1255 601 122 60
10312552 1255 601 122 80
10312553 1255 601 123 90
10312554 1255 601 124 66
10312561 1256 601 122 40
10312562 1256 601 123 30
10312563 1256 601 124 .
10312564 1256 601 125 66
10312581 1258 602 126 80
10312582 1258 602 127 95
10312583 1258 602 127 100
10312584 1258 602 128 .
输出:
ssuid sup suspect
1255 601 122
1256 601 122,123,124
1258 602 128
输出中不需要变量primkey
和score
字段。
以下是我已经尝试过的代码:
sort state ssuid urid sup
gen x=_n
gen suspect=.
replace suspect=urid if (score <=60 | score==.) & urid=urid[x-1]
drop x
sort state ssuid suspect
gen x=_n
tostring suspect, replace
replace suspect=suspect[x-1]+","+suspect if suspect!="." & ssuid==ssuid[x-1]
drop x
gen x=strlen(ssususpect_sup)
gsort state ssuid -x
drop x
gen x=_n
replace suspect=suspect_sup[x-1] if ssuid==ssuid[x-1]
drop x
bys ssuid:gen x=_n
keep if x==1
drop x
但是,这不能产生预期的结果。
答案 0 :(得分:1)
您的示例并不完整。
对于ssuid
1255,最低分数是60。因此,分数不低于60。
对于ssuid
1258,最低分是80。
Stata的规则是,失踪人数任意大都为正数,因此不得少于60。
您的规则似乎小于或等于60或丢失。
有了这些修复,这对我有用:
clear
input primkey ssuid sup urid score
10312551 1255 601 122 60
10312552 1255 601 122 80
10312553 1255 601 123 90
10312554 1255 601 124 66
10312561 1256 601 122 40
10312562 1256 601 123 30
10312563 1256 601 124 .
10312564 1256 601 125 66
10312581 1258 602 126 80
10312582 1258 602 127 95
10312583 1258 602 127 100
10312584 1258 602 128 .
end
egen tokeep = max(score <= 60 | missing(score)), by(ssuid)
keep if tokeep
drop tokeep
drop if inrange(score, 61, .)
bysort ssuid (primkey) : gen suspect = string(urid) if _n == 1
by ssuid: replace suspect = suspect[_n-1] + ///
"," + string(urid) if _n > 1 & urid != urid[_n-1]
by ssuid: keep if _n == _N
keep ssuid sup suspect
list
+---------------------------+
| ssuid sup suspect |
|---------------------------|
1. | 1255 601 122 |
2. | 1256 601 122,123,124 |
3. | 1258 602 128 |
+---------------------------+
EDIT让我们看一下原始代码。我已经发表了一些评论,并发现了一些可以简化的代码和一个明显的错误,但是在此之后我获得了保释。问题必须是可复制的!
sort state ssuid urid sup
gen x=_n
gen suspect=.
replace suspect=urid if (score <=60 | score==.) & urid=urid[x-1]
drop x
sort state ssuid suspect
gen x=_n
tostring suspect, replace
replace suspect=suspect[x-1]+","+suspect if suspect!="." & ssuid==ssuid[x-1]
drop x
gen x=strlen(ssususpect_sup)
gsort state ssuid -x
drop x
gen x=_n
replace suspect=suspect_sup[x-1] if ssuid==ssuid[x-1]
drop x
bys ssuid:gen x=_n
keep if x==1
drop x
*1 There is nothing in the data example about a variable -state-.
* What do you don't show, we need not try to replicate
*2 Your use of x for _n isn't needed. You can use _n directly.
* That saves 8 lines
sort ssuid urid sup
gen suspect=.
replace suspect=urid if (score <=60 | score==.) & urid=urid[_n-1]
sort ssuid suspect
tostring suspect, replace
replace suspect=suspect[_n-1]+","+suspect if suspect!="." & ssuid==ssuid[_n-1]
gen x=strlen(ssususpect_sup)
gsort ssuid -x
drop x
replace suspect=suspect_sup[_n-1] if ssuid==ssuid[_n-1]
bys ssuid: keep if _n == 1
*3 Your first three lines can be slimmed to one
by sort ssuid urid (sup) : gen suspect = urid if (score <=60 | score==.) & urid=urid[_n-1]
sort ssuid suspect
tostring suspect, replace
replace suspect=suspect[_n-1]+","+suspect if suspect!="." & ssuid==ssuid[_n-1]
gen x=strlen(ssususpect_sup)
gsort ssuid -x
drop x
replace suspect=suspect_sup[_n-1] if ssuid==ssuid[_n-1]
bys ssuid: keep if _n == 1
*4 It's simpler to create -suspect- as string in the first place
* so cut the -tostring- line
bysort ssuid urid (sup) : gen suspect = string(urid) if (score <=60 | score==.) & urid=urid[_n-1]
sort ssuid suspect
replace suspect=suspect[_n-1]+","+suspect if suspect!="." & ssuid==ssuid[_n-1]
gen x=strlen(ssususpect_sup)
gsort ssuid -x
drop x
replace suspect=suspect_sup[_n-1] if ssuid==ssuid[_n-1]
bys ssuid: keep if _n == 1
*5 bug: testing for equality needs == not =
bysort ssuid urid (sup) : gen suspect = string(urid) if (score <=60 | score==.) & urid==urid[_n-1]
sort ssuid suspect
replace suspect=suspect[_n-1]+","+suspect if suspect!="." & ssuid==ssuid[_n-1]
gen x=strlen(ssususpect_sup)
gsort ssuid -x
drop x
replace suspect=suspect_sup[_n-1] if ssuid==ssuid[_n-1]
bys ssuid: keep if _n == 1
*6 you refer to a variable -ssussusspect_sup- which you haven't supplied as
* data, or created in your code. At this point I bail out.