总结斯塔塔的观察结果

时间:2016-08-29 11:14:36

标签: string statistics stata data-management

我正在尝试在Stata中总结数据。我有英国地方政府机构代码(例如E06000047)和数据集,它更深层次(MSOA)。

MSOA code   MSOA name           Local authority code    Net weekly income
E02004297   County Durham 001   E06000047               480.00
E02004290   County Durham 002   E06000047               540.00
E02004298   County Durham 003   E06000047               520.00
E02004299   County Durham 004   E06000047               430.00
E02004291   County Durham 005   E06000047               400.00

由于我对MSOA级数据不感兴趣,我想将数据总结为本地auth代码级别。我失败的地方是我无法用字符串数据计算。我想做的是:

foreach identical "Local authority code" take the mean/median and 
store it in a var "means.local-auth"

所以我期待的是:

Local authority code  means.local-auth  median.local-auth
E06000047             474.00            480.00
E06000048             486.00            485.00

2 个答案:

答案 0 :(得分:2)

由于您的问题不清楚您的目标是制作报告(例如Nick's),还是在地方当局层面迈出分析的第一步,因此这里的代码使用collapse来获取您的数据并将其降低到地方当局的水平。

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input str20(msoa_c msoa_n lac) float income

          msoa_c              msoa_n         lac income
  1. "E02004297" "County Durham 001" "E06000047" 480
  2. "E02004290" "County Durham 002" "E06000047" 540
  3. "E02004298" "County Durham 003" "E06000047" 520
  4. "E02004299" "County Durham 004" "E06000047" 430
  5. "E02004291" "County Durham 005" "E06000047" 400
  6. end

. format income %9.2f

. drop msoa_c msoa_n 

. collapse (mean) mean_inc=income (median) med_inc=income, by(lac)

. list

     +--------------------------------+
     |       lac   mean_inc   med_inc |
     |--------------------------------|
  1. | E06000047     474.00    480.00 |
     +--------------------------------+

. 

答案 1 :(得分:1)

对于这种简单的摘要,不需要循环。这是一个可重现的示例,egen用于使用by()生成变量(其参数可以是数字或字符串,实际上不必是单个变量)。 tabdisp可以方便地进行简单的制表。

sysuse auto, clear
egen mean_mpg = mean(mpg), by(rep78) 
egen median_mpg = median(mpg), by(rep78) 

tabdisp rep78, c(mean_mpg median_mpg) 

----------------------------------
Repair    |
Record    |
1978      |   mean_mpg  median_mpg
----------+-----------------------
        1 |         21          21
        2 |     19.125          18
        3 |   19.43333          19
        4 |   21.66667        22.5
        5 |   27.36364          30
        . |       21.4          22
----------------------------------

tabdisp rep78, c( mean_mpg median_mpg) format(%2.1f)

----------------------------------
Repair    |
Record    |
1978      |   mean_mpg  median_mpg
----------+-----------------------
        1 |       21.0        21.0
        2 |       19.1        18.0
        3 |       19.4        19.0
        4 |       21.7        22.5
        5 |       27.4        30.0
        . |       21.4        22.0
----------------------------------