Question

这可能已经得到了回答，但我必须只是在寻找错误的条款。假设我使用内置的Stata数据集 auto ：

sysuse auto, clear

并说例如我正在使用1个独立变量和1个因变量，我想基本压缩到IQR元素，min，p（25），median，p（75），max ... 所以我使用命令，

keep weight mpg

sum weight, detail

return list

local min=r(min)

local lqr=r(p25)

local med = r(p50)

local uqr = r(p75)

local max = r(max)

keep if weight==`min' | weight==`max' | weight==`med' | weight==`lqr' | weight==`uqr'

因此，我想将数据集压缩到仅仅那5个观察值，例如在这种情况下，中值实际上不是权重向量的元素。上面有一个观察和下面的观察（由于中位数的定义，这并不奇怪）。有没有办法告诉stata寻找百分位数以上的最近邻居。即。如果r（p50）不是重量元素，那么在下一个观察值上方搜索该值以上？最终的结果是我试图将数据下降到2个向量，比如说重量和英里/加仑，这样，对于IQR中5个重量元素中的每一个，它们都有以mpg为单位的匹配响应。有什么想法吗？

Answer 1

我想你想要这样的东西：

clear all
set more off

sysuse auto
keep weight mpg

summarize weight, detail

local min = r(min)
local lqr = r(p25)
local med = r(p50)
local uqr = r(p75)
local max = r(max)

* differences between weights and its median
gen diff = abs(weight - `med')

* put the smallest difference in observation 1 (there can be several, watch out!)
isid diff weight mpg, sort

* replace the original median with the weight "closest" to the median
local med = weight[1]

keep if inlist(weight, `min', `lqr', `med', `uqr', `max')
drop diff

* pretty print
order weight mpg
sort weight mpg
list, sep(0)

请注意，中位数没有出现，因为我们保持了最近的＆＃34;相反的邻居（权重== 3,180）。此外，百分位数75有两个相关的mpg值。

您可以使用collapse和merge（还有更多）来解决问题，但我会将其留在此处。

将help <command>用于任何不明确的事情。

Answer 2

感谢所有的建议，这是我想出的。我的想法是，我正在拉动这5个数字，所以我可以将它们发送给mata，用于我试图编写的三次样条。无论出于什么原因，试图概括这一点让我头疼。

我的最终解决方案：

    sysuse auto, clear
    preserve
    sort weight
    count if weight<.
    keep if _n==1 | _n==ceil(r(N)/4) | _n==ceil(r(N)/2) | _n==ceil(3*r(N)/4) | _n==_N
    gen X = weight
    gen Y = mpg
    list X Y 
    /* at this point I will send X and Y to mata for the cubic spline 
    routine that I am in the process of writing. It was this little step that 
    was bugging me. */

    restore

Stata百分位数的最近邻居

2 个答案: