根据变量占总数的百分比对变量进行排名

时间:2018-07-18 14:04:12

标签: stata ranking rank

请考虑以下示例数据:

 psu   |  sumsc   sumst   sumobc   sumother   sumcaste
-------|-----------------------------------------------
10018  |    3       2        0         4          9
       |
10061  |    0       0        2         5          7
       |
10116  |    1       1        2         4          8
       |
10121  |    3       0        1         2          6
       |
20002  |    4       1        0         1          6
-------------------------------------------------------

我想根据变量sumscsumstsumobcsumothersumcaste的贡献百分比对它们进行排名(这是所有变量)在psu中。

有人可以帮我在Stata中做到这一点吗?

2 个答案:

答案 0 :(得分:1)

首先,您需要计算百分比:

clear

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end

foreach var of varlist sumsc sumst sumobc sumother {
    generate pct_`var' = 100 * `var' / sumcaste
}

egen pcttotal = rowtotal(pct_*)

list pct_* pcttotal, abbreviate(15) noobs

  +--------------------------------------------------------------+
  | pct_sumsc   pct_sumst   pct_sumobc   pct_sumother   pcttotal |
  |--------------------------------------------------------------|
  |  33.33333    22.22222            0       44.44444        100 |
  |         0           0     28.57143       71.42857        100 |
  |      12.5        12.5           25             50        100 |
  |        50           0     16.66667       33.33333        100 |
  |  66.66666    16.66667            0       16.66667   99.99999 |
  +--------------------------------------------------------------+

然后,您需要获得排名并进行一些体操:

rowranks pct_*, generate(r_sumsc r_sumst r_sumobc r_sumother) field lowrank

mkmat r_*, matrix(A)
matrix A = A'
svmat A, names(row)

local matnames : rownames A
quietly generate name = " "

forvalues i = 1 / `: word count `matnames'' {
    quietly replace name = substr(`"`: word `i' of `matnames''"', 3, .) in `i'
}

ds row*

foreach var in `r(varlist)' {
    sort `var' name
    generate `var'b = sum(`var' != `var'[_n-1])
    drop `var'
    rename `var'b `var'
    list name `var' if name != " ", noobs
    display ""
}

以上内容将为您提供所需的内容:

  +-----------------+
  |     name   row1 |
  |-----------------|
  | sumother      1 |
  |    sumsc      2 |
  |    sumst      3 |
  |   sumobc      4 |
  +-----------------+

  +-----------------+
  |     name   row2 |
  |-----------------|
  | sumother      1 |
  |   sumobc      2 |
  |    sumsc      3 |
  |    sumst      3 |
  +-----------------+

  +-----------------+
  |     name   row3 |
  |-----------------|
  | sumother      1 |
  |   sumobc      2 |
  |    sumsc      3 |
  |    sumst      3 |
  +-----------------+

  +-----------------+
  |     name   row4 |
  |-----------------|
  |    sumsc      1 |
  | sumother      2 |
  |   sumobc      3 |
  |    sumst      4 |
  +-----------------+

  +-----------------+
  |     name   row5 |
  |-----------------|
  |    sumsc      1 |
  | sumother      2 |
  |    sumst      2 |
  |   sumobc      3 |
  +-----------------+

请注意,在执行上述代码之前,您首先需要安装社区贡献的命令rowranks

net install pr0046.pkg

答案 1 :(得分:1)

首先我们输入数据:

clear all
set more off

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end

第二,我们准备reshape

local j=1
foreach var of varlist sumsc sumst sumobc sumother {
    gen temprl`j' = `var' / sumcaste
    ren `var' addi`j'
    local ++j
}

reshape long temprl addi, i(psu) j(ord)
lab def ord 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
lab val ord ord

第三,我们在展示之前先订购:

gsort psu -temprl
by psu: gen nro=_n
drop temprl
order psu nro ord

第四,显示数据:

br psu nro ord addi

编辑:

这是Aron解决方案与我的解决方案(@PearlySpencer)的组合:

clear

input psu sumsc sumst sumobc sumother sumcaste
10018 3 2 0 4 9
10061 0 0 2 5 7
10116 1 1 2 4 8
10121 3 0 1 2 6
20002 4 1 0 1 6
end

local i = 0
foreach var of varlist sumsc sumst sumobc sumother {
    local ++i
    generate pct`i' = 100 * `var' / sumcaste
    rename `var' temp`i'
    local rvars "`rvars' r`i'"                  
}

rowranks pct*, generate("`rvars'") field lowrank

reshape long pct temp r, i(psu) j(name)

label define name 1 "sumsc" 2 "sumst" 3 "sumobc" 4 "sumother"
label values name name

keep psu name pct r
bysort psu (r): replace r = sum(r != r[_n-1])

这将为您提供所需的输出:

list, sepby(psu) noobs

  +---------------------------------+
  |   psu       name        pct   r |
  |---------------------------------|
  | 10018   sumother   44.44444   1 |
  | 10018      sumsc   33.33333   2 |
  | 10018      sumst   22.22222   3 |
  | 10018     sumobc          0   4 |
  |---------------------------------|
  | 10061   sumother   71.42857   1 |
  | 10061     sumobc   28.57143   2 |
  | 10061      sumsc          0   3 |
  | 10061      sumst          0   3 |
  |---------------------------------|
  | 10116   sumother         50   1 |
  | 10116     sumobc         25   2 |
  | 10116      sumst       12.5   3 |
  | 10116      sumsc       12.5   3 |
  |---------------------------------|
  | 10121      sumsc         50   1 |
  | 10121   sumother   33.33333   2 |
  | 10121     sumobc   16.66667   3 |
  | 10121      sumst          0   4 |
  |---------------------------------|
  | 20002      sumsc   66.66666   1 |
  | 20002      sumst   16.66667   2 |
  | 20002   sumother   16.66667   2 |
  | 20002     sumobc          0   3 |
  +---------------------------------+

如果您需要变量进行进一步分析而不是仅仅显示结果,则此方法将很有用。