我有一些数据,其中包含一个字符串变量(US states
),一个相应的整数变量(enrollment
)和另一个字符串。
不幸的是,US states
变量下的某些单元格列出了多个状态,并用分号隔开。我想将这些划分为不同的行,然后在这些州之间平均分配相应的入学人数。
例如,我有:
State Enrollment Severity
CA 100 Low
MA;PA 50 Medium
WA;OR;ID 120 High
我希望能够将其转换为:
State Enrollment Severity
CA 100 Low
MA 25 Medium
PA 25 Medium
WA 40 High
OR 40 High
ID 40 High
我尝试使用split
命令将它们分开,然后(以复杂的方式,计算相应的入学人数),但是我不确定如何使用{{1} }。
编辑:
我也希望该解决方案能够处理重复的状态。
例如:
reshape
转换为:
State Enrollment Severity
CA 100 Low
MA;CA 50 Medium
WA;CA;ID 120 High
答案 0 :(得分:2)
这是使用 原始 数据进行操作的一种方法:
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?
keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | PA 25 Medium |
4. | OR 40 High |
5. | ID 40 High |
6. | WA 40 High |
+-------------------------------+
编辑:
这是使用 修订的 数据来做您想要的事情的一种方法:
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;CA" 50 "Medium"
"WA;CA;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(id)
keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | CA 25 Medium |
4. | WA 40 High |
5. | CA 40 High |
6. | ID 40 High |
+-------------------------------+