如何根据另一个范围简单地创建新变量

时间:2015-09-24 16:09:46

标签: variables stata

说我var1是连续的:

clear
set obs 1000
gen var1 = runiform()
sum var1

现在我想根据var2的范围创建var1。我可以这样做:

gen var2 = "Lowest" if var1<.25
replace var2 = "Low" if var1>=.25 & var1<.5
replace var2 = "High" if var1>=.5 & var1<.75
replace var2 = "Highest" if var1>=.75

我希望能够在一行中做到这一点。伪代码:

gen var2 = (ranges(0 .25 .5 .75 1) values("Lowest" "Low" "High" "Highest"))

使用Rcut中使用var2执行相似内容的方法可在Create categorical variable in R based on range

找到

是否有任何命令可以在Stata中执行类似R版本的操作?想象一下,有一个需要进入generate var2 = cond(var1<=.25, "Lowest", cond(var1<=.50, "Low", cond(var1<=.75, "High", cond(var1<=1.00, "Highest", "")))) 的10,000个范围。那么更好的方法会有很大帮助。

在Stata的一行中执行此操作的另一种方法是笨重,可以在http://www.stata.com/support/faqs/data-management/multiple-operations/找到:

let events = eventData.map(( timelineEvent ) => {
      let directions;
      if (timelineEvent % 2 == 0) {
        directions = "direction-r";
      } else {
        directions = "direction-l"
      }
      return (
        <TimelineEvent 
          type = {timelineEvent.type}
          time = {timelineEvent.time}
          title = {timelineEvent.title}
          place = {timelineEvent.place}
          location = {timelineEvent.location}
          description = {timelineEvent.description}
          direction = {directions}>
          <div>gallery</div>
          <TimelineEditButton
            deleteClick={timelineEvent.id} 
            dataId={ timelineEvent.id}
            editClick={this.openPartial.bind(this, "editEventPartial")} />
        </TimelineEvent>
      );
    });

有更好的方法吗?

2 个答案:

答案 0 :(得分:3)

var3函数是所谓的笨重函数。有关示例,请参阅下面的egen, cut()。它具有信号优势,您可以在代码中明确地表达不等式,并且完全符合您的意愿,var4都不是这样。

在这个特定的例子中,至少还有一个技巧是可能的。请参阅下面的. clear . set obs 15 number of observations (_N) was 0, now 15 . set seed 2803 . gen var1 = runiform() . sort var1 . gen var2 = "Lowest" if var1<.25 (9 missing values generated) . replace var2 = "Low" if var1>=.25 & var1<.5 (4 real changes made) . replace var2 = "High" if var1>=.5 & var1<.75 (2 real changes made) . replace var2 = "Highest" if var1>=.75 variable var2 was str6 now str7 (3 real changes made) . gen var3 = cond(var1 < .25, "Lowest", cond(var1 <.5, "Low", cond(var1 <.75, " > High", "Highest"))) . gen var4 = word("Lowest Low High Highest", ceil(4 * var1)) . list +----------------------------------------+ | var1 var2 var3 var4 | |----------------------------------------| 1. | .0200225 Lowest Lowest Lowest | 2. | .0360774 Lowest Lowest Lowest | 3. | .0934085 Lowest Lowest Lowest | 4. | .0950848 Lowest Lowest Lowest | 5. | .1040797 Lowest Lowest Lowest | |----------------------------------------| 6. | .1795591 Lowest Lowest Lowest | 7. | .3326341 Low Low Low | 8. | .3383934 Low Low Low | 9. | .3870576 Low Low Low | 10. | .3980427 Low Low Low | |----------------------------------------| 11. | .6264514 High High High | 12. | .6305373 High High High | 13. | .7739685 Highest Highest Highest | 14. | .7935746 Highest Highest Highest | 15. | .9243789 Highest Highest Highest | +----------------------------------------+ 了解它的含义。

merge

但是,如果你确实要指定10,000个范围,并且它们不能归结为一些简单的规则,那么你自然不会这样做。您应该将它们放在一个文件中,并使用一些基于{{1}}的代码。

答案 1 :(得分:2)

Stata确实有一个cut函数,作为egen命令的一部分。使用它的选项并定义和赋值标签可以获得所需的结果(尽管有三行而不是一行,但它们是三条相当简洁的行)。 E.g。

clear
set obs 15
gen var1 = runiform()
sum var1

gen var2 = "Lowest" if var1<.25
replace var2 = "Low" if var1>=.25 & var1<.5
replace var2 = "High" if var1>=.5 & var1<.75
replace var2 = "Highest" if var1>=.75

// =======================================================
// Using egen , cut()
// =======================================================
label define rank 0 "Lowest" 1 "Low" 2 "High" 3 "Highest"
egen var3 = cut(var1) , at(0(.25)1) icodes
label values var3 rank

li

结果

     +------------------------------+
     |     var1      var2      var3 |
     |------------------------------|
  1. | .6658295      High      High |
  2. | .3690664       Low       Low |
  3. | .5983131      High      High |
  4. | .2658775       Low       Low |
  5. | .1211114    Lowest    Lowest |
     |------------------------------|
  6. | .2296222    Lowest    Lowest |
  7. | .7229139      High      High |
  8. | .2501513       Low       Low |
  9. | .7775574   Highest   Highest |
 10. | .2839603       Low       Low |
     |------------------------------|
 11. | .8396428   Highest   Highest |
 12. | .4838379       Low       Low |
 13. | .2610629       Low       Low |
 14. | .3855471       Low       Low |
 15. | .3447088       Low       Low |
     +------------------------------+