这主要是一个逻辑问题。
我正试图找出一群人服用药物的模式。我的第一步是找到4种药物的“持续使用者”。在第4种药物最初处方后,我将连续使用定义为4种药物的重复处方。
对于某些人来说,他们可能在开始第四次药物治疗后连续使用4种药物。我发现第四种药物(我感兴趣的第四种药物是B,Q,S和T),然后我看看这个人是否继续服用A + C + D +第四种药物的4种药物。这就是我对4种药物的处理方法(迷你数据集在下面,标记为4种吸毒者);
bys id: gen interest=0
by id: replace interest =1 if (agent_type == "T" | agent_type =="Q" | agent_type =="S" | agent_type =="B") ///
& con_4_4 ==1 & count==4
by id: egen interest4=max(interest) //notes: this variable tells me if the person has a 4th drug of interest to me; drug B, Q, S or T
gen acd_4_1=0
by id: replace acd_4_1 =1 if (agent_type == "A"| agent_type =="C" | agent_type=="D") & count==1
gen acd_4_2=0
by id: replace acd_4_2 =1 if (agent_type == "A"| agent_type =="C" | agent_type=="D") & count==2
gen acd_4_3=0
by id: replace acd_4_3 =1 if (agent_type == "A"| agent_type =="C" | agent_type=="D") & count==3
by id: egen acd_4_11 =max(acd_4_1)
by id: egen acd_4_22 =max(acd_4_2)
by id: egen acd_4_33 =max(acd_4_3)
gen acd_4=1 if acd_4_11 ==1 & acd_4_22 ==1 & acd_4_33 ==1 & interest4==1 //acd_4 is a variable indicating whether people had the desired pattern after initiating their 4th agent
*notes:
*kate has acd_4 = . because she used a prohibited drug "Q" and also her 4th drug was not of interest to us (was "A" as opposed to T, Q, S or B)
*mark has acd_4==1 because he used the correct pattern A+C+D after the prescription of his 4th drug which was S (count=4, date 5th October 2000)
现在,它变得更加棘手。其他可能正在转换药物或停药的人可能不会连续使用4种药物,直到他们的第5次药物或第6次药物治疗。例如,仅在第5次药物治疗后,他们才会重复使用A + C + D和药物5的处方,在这种情况下,这些药物是我们感兴趣的药物(同样,它将是B,Q,S或T)。
如果他们有另外的药物B,Q,S,T以及他们感兴趣的药物和感兴趣的模式 - 那么我想标记这一点,因为我想排除该人的模式进一步考虑。例如,我想要med5 + A + C + D而不是med5 + A + C + D + S.
我已经找到了一种方法(下面的迷你数据集,标记为" 5drug用户"),但我的代码很笨重,需要很长时间才能完成我的大数据集。任何人都可以提出一些建议:1)改进我的逻辑或2)改进我的编码,或3)两者!
gen interest5=0
bys id: replace interest5 =1 if (agent_type == "T" | agent_type =="Q" | agent_type =="S" | agent_type =="B") ///
& con_5_5 ==1 & count==5
by id: egen interest55 = max(interest5)
drop interest5
ren interest55 interest5
by id: gen A5=1 if (agent_type =="A") & (rx_date >fifth_con_full & rx_date <=fifth_con_full+180) & interest5==1
by id: egen AA55=max(A5)
drop A5
by id: gen C5=1 if (agent_type =="C") & (rx_date >fifth_con_full & rx_date <=fifth_con_full+180) & interest5==1
by id: egen C55=max(C5)
drop C5
by id: gen D5=1 if (agent_type =="D") & (rx_date >fifth_con_full & rx_date <=fifth_con_full+180) & interest5==1
by id: egen D55=max(D5)
drop D5
by id: gen acd_5=1 if (AA55==1 & C55==1 & D55==1) & interest5==1
*make sure patient isn't taking any of the other comparator agents
by id: gen prohib=1 if (agent_type == "T" | agent_type =="Q" | agent_type =="S" | agent_type =="B") ///
& (rx_date >fifth_con_full & rx_date <=fifth_con_full+180) & interest5==1 & count!=5 //here the count!=5 code indicates that I want stata to flag if the patient is taking any of the comparator agents, not inclusive ofthe compartor agent of interest, in this case the comparator agent is count==5
by id: egen prohib55=max(prohib)
by id: gen pattern=1 if acd_5 ==1 & prohib55 !=1
*notes:
*mary has pattern = . because she used a prohibited drug "B" after the prescription of her 4th agent (here count=5, agent_type "T", starting on 29th July 05)
*Pat has pattern=1 because he used A+C+D after his 4th agent (here count=5, agent-type==B, starting on 28th Jan 09)
*Sue has pattern=. because she used a prohibited drug "T" after the precription of her 4th agent (here count=5, agenttype==B, startig on 25th Feb 2011)
数据集
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id int rx_date str1 agent_type byte count int fourth_full byte con_4_4 int fourth_con_full
"kate" 16728 "Q" 1 . 1 16733
"kate" 16728 "C" 3 . 1 16733
"kate" 16733 "A" 4 16733 1 16733
"kate" 16758 "B" 2 16733 1 16733
"kate" 16758 "Q" 1 16733 1 16733
"kate" 16758 "C" 3 16733 1 16733
"kate" 16762 "A" 4 16733 1 16733
"kate" 16784 "C" 3 16733 1 16733
"kate" 16784 "A" 4 16733 1 16733
"kate" 16784 "Q" 1 16733 1 16733
"kate" 16784 "B" 2 16733 1 16733
"kate" 16812 "Q" 1 16733 1 16733
"kate" 16812 "B" 2 16733 1 16733
"kate" 16812 "A" 4 16733 1 16733
"kate" 16812 "C" 3 16733 1 16733
"kate" 16841 "Q" 1 16733 1 16733
"kate" 16841 "C" 3 16733 1 16733
"kate" 16841 "B" 2 16733 1 16733
"mark" 14874 "C" 2 . 1 14888
"mark" 14874 "A" 1 . 1 14888
"mark" 14888 "S" 4 14888 1 14888
"mark" 14888 "D" 3 14888 1 14888
"mark" 14930 "S" 4 14888 1 14888
"mark" 14930 "C" 2 14888 1 14888
"mark" 14930 "A" 1 14888 1 14888
"mark" 14930 "D" 3 14888 1 14888
"mark" 14965 "S" 4 14888 1 14888
"mark" 14965 "A" 1 14888 1 14888
"mark" 14965 "D" 3 14888 1 14888
"mark" 14965 "C" 2 14888 1 14888
"mark" 15028 "S" 4 14888 1 14888
"mark" 15028 "C" 2 14888 1 14888
"mark" 15028 "A" 1 14888 1 14888
"mark" 15028 "D" 3 14888 1 14888
"mark" 15097 "C" 2 14888 1 14888
"mark" 15097 "A" 1 14888 1 14888
"mark" 15097 "D" 3 14888 1 14888
"mark" 15097 "S" 4 14888 1 14888
end
format %tddd-Mon-YY rx_date
format %tddd-Mon-YY fourth_full
format %tddd-Mon-YY fourth_con_full
* Example generated by -dataex-. To install: ssc install dataex
clear
input str4 id int rx_date str1 agent_type byte count int fifth_full byte con_5_5 int fifth_con_full
"pat" 17910 "D" 1 . 1 17925
"pat" 17910 "A" 4 . 1 17925
"pat" 17910 "C" 2 . 1 17925
"pat" 17925 "B" 5 17925 1 17925
"pat" 17948 "B" 5 17925 1 17925
"pat" 17969 "C" 2 17925 1 17925
"pat" 17969 "B" 5 17925 1 17925
"pat" 17969 "D" 1 17925 1 17925
"pat" 17969 "A" 4 17925 1 17925
"pat" 18028 "D" 1 17925 1 17925
"pat" 18028 "B" 5 17925 1 17925
"pat" 18028 "C" 2 17925 1 17925
"pat" 18028 "A" 4 17925 1 17925
"pat" 18081 "D" 1 17925 1 17925
"pat" 18081 "C" 2 17925 1 17925
"mary" 16618 "C" 2 . 1 16646
"mary" 16618 "D" 3 . 1 16646
"mary" 16618 "B" 1 . 1 16646
"mary" 16646 "T" 5 16646 1 16646
"mary" 16679 "A" 4 16646 1 16646
"mary" 16679 "C" 2 16646 1 16646
"mary" 16679 "D" 3 16646 1 16646
"mary" 16679 "B" 1 16646 1 16646
"mary" 16681 "T" 5 16646 1 16646
"mary" 16737 "D" 3 16646 1 16646
"mary" 16737 "B" 1 16646 1 16646
"mary" 16737 "A" 4 16646 1 16646
"sue" 18676 "D" 3 . 1 18683
"sue" 18676 "C" 2 . 1 18683
"sue" 18676 "T" 4 . 1 18683
"sue" 18683 "B" 5 18683 1 18683
"sue" 18729 "C" 2 18683 1 18683
"sue" 18729 "B" 5 18683 1 18683
"sue" 18729 "T" 4 18683 1 18683
"sue" 18729 "D" 3 18683 1 18683
"sue" 18730 "C" 2 18683 1 18683
"sue" 18779 "C" 2 18683 1 18683
"sue" 18779 "T" 4 18683 1 18683
"sue" 18779 "D" 3 18683 1 18683
"sue" 18826 "A" 1 18683 1 18683
"sue" 18834 "C" 2 18683 1 18683
"sue" 18834 "T" 4 18683 1 18683
"sue" 18834 "D" 3 18683 1 18683
"sue" 18889 "D" 3 18683 1 18683
end
format %tddd-Mon-YY rx_date
format %tddd-Mon-YY fifth_full
format %tddd-Mon-YY fifth_con_full
答案 0 :(得分:0)
这不是一个答案,但它不适合评论。您的代码清晰但可以压缩。例如,第一个块可以简化为
gen interest = inlist(agent_type, "T", "Q", "S", "B") & con_4_4 ==1 & count==4
bysort id: egen interest4 = max(interest)
gen acd_4_1 = inlist(agent_type, "A", "C", "D") & count==1
gen acd_4_2 = inlist(agent_type, "A", "C", "D") & count==2
gen acd_4_3 = inlist(agent_type, "A", "C", "D") & count==3
by id: egen acd_4_11 = max(acd_4_1)
by id: egen acd_4_22 = max(acd_4_2)
by id: egen acd_4_33 = max(acd_4_3)
gen acd_4= 1 if acd_4_11 ==1 & acd_4_22 ==1 & acd_4_33 ==1 & interest4==1
那是13行到9行。
只是化妆品,但最重要的是你希望你的真正问题要清楚并得到回答。
那里的小技术包括
省略by:
时对结果没有影响。
煮沸generate
和replace
对以在单个语句中生成0,1个变量。
使用inlist()
简洁地捕捉替代方案。
更简单地重写问题会更有可能尝试解决您的真实问题。