将字符串数据和相应数据拆分到新行

时间:2018-09-07 02:06:36

标签: stata

我有一些数据,其中包含一个字符串变量(US states),一个相应的整数变量(enrollment)和另一个字符串。

不幸的是,US states变量下的某些单元格列出了多个状态,并用分号隔开。我想将这些划分为不同的行,然后在这些州之间平均分配相应的入学人数。

例如,我有:

State       Enrollment   Severity
CA            100          Low
MA;PA         50           Medium
WA;OR;ID      120          High

我希望能够将其转换为:

State       Enrollment    Severity
CA             100          Low
MA             25           Medium
PA             25           Medium
WA             40           High
OR             40           High
ID             40           High

我尝试使用split命令将它们分开,然后(以复杂的方式,计算相应的入学人数),但是我不确定如何使用{{1} }。


编辑:

我也希望该解决方案能够处理重复的状态。

例如:

reshape

转换为:

State       Enrollment   Severity
CA            100          Low
MA;CA         50           Medium
WA;CA;ID      120          High

1 个答案:

答案 0 :(得分:2)

这是使用 原始 数据进行操作的一种方法:

clear 
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end

generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?

keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    PA           25     Medium |
  4. |    OR           40       High |
  5. |    ID           40       High |
  6. |    WA           40       High |
     +-------------------------------+

编辑:

这是使用 修订的 数据来做您想要的事情的一种方法:

clear
input str10 State Enrollment str10 Severity
"CA"            100          "Low"
"MA;CA"         50           "Medium"
"WA;CA;ID"      120          "High"
end

generate id = _n
split State, p(;)
drop State

reshape long State, i(id)

keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    CA           25     Medium |
  4. |    WA           40       High |
  5. |    CA           40       High |
  6. |    ID           40       High |
     +-------------------------------+