如何在Stata中分割观察结果

时间:2018-05-23 21:15:39

标签: split stata

我一直试图在我的Stata数据集中进行以下转换: enter image description here NumberClusterRating是我的三个变量。所有值都是字符串。

你有任何特定于Stata的建议吗?

谢谢!

1 个答案:

答案 0 :(得分:1)

尽管缺乏良好的数据示例或尝试过的代码,但仍可以提出建议:

clear
input str7 Number str3 Cluster str6 Rating
"017;092" "Z12" "High"  
"400;401" "Z14" "Medium"
"523"     "Z98" "Low"   
end

replace Number = subinstr(Number, ";", " ", .)
gen count = wordcount(Number)
gen long id = _n
expand count
bysort id : replace Number = word(Number, _n)

list, sepby(id)

     +----------------------------------------+
     | Number   Cluster   Rating   count   id |
     |----------------------------------------|
  1. |    017       Z12     High       2    1 |
  2. |    092       Z12     High       2    1 |
     |----------------------------------------|
  3. |    400       Z14   Medium       2    2 |
  4. |    401       Z14   Medium       2    2 |
     |----------------------------------------|
  5. |    523       Z98      Low       1    3 |
     +----------------------------------------+

这里的代码取决于一个假设,即一旦从Number中删除了半冒号,其余的元素都将被视为Stata意义上的单词。如果情况并非如此,您应该说明哪些是有问题的,并且可以建议替代代码。

编辑:更通用的代码:

clear
input str7 Number str3 Cluster str6 Rating
"017;092" "Z12" "High"  
"400;401" "Z14" "Medium"
"523"     "Z98" "Low"   
end

split Number, parse(;)
local nvars : word count `r(varlist)' 
gen long id = _n
expand `nvars' 

forval j=1/`nvars' { 
    bysort id: replace Number = Number`j' if _n == `j' 
}

drop if missing(Number) 

list, sepby(id)