Question

我和斯塔塔在第一场比赛中。我直到本周才开始使用它，并试图通过一些例子。我有以下数据集：

contruse | educ_none | educ_prim | educ_secabove
1        | 0         | 1         | 0
0        | 1         | 0         | 0
...

我使用相应的数据集创建了以下变量，以便我可以tab contruse进行所有不同的教育。

gen education=0
replace education=1 if educ_none==1
replace education=2 if educ_prim==1
replace education=3 if educ_secabove==1
replace education=. if educ_none==. | educ_prim==. | educ_secabove==.
tab education, missing

contruse | educ_none | educ_prim | educ_secabove | education
1        | 0         | 1         | 0             | 2
0        | 1         | 0         | 0             | 1

基本上有一种更好的方法：例如我的varlist可能是任意大的，并且做上述操作是痛苦的。有没有办法说通过多个变量来反转以下内容并为单个变量赋值？

foreach x of varlist educ_none educ_prim educ_secabove {
    replace `x' = . if var > 3
}

Answer 1

你能自动化这个过程吗？答案是＆＃34; No＆＃34;，因为每个组件变量都有一个唯一的后缀。所以，如果你有＆＃34; race_black＆＃34; ＆＃34; race_hisp_nonw＆＃34; ＆＃34; race_white＆＃34;，例如，您无法处理＆＃34;教育＆＃34;和＃34;比赛＆＃34;变量以同样的方式。您还将为每个变量分配唯一的值标签。 请参阅下面的第二个答案。

另外两个问题：

阅读你的例子，似乎那里的教育正好是三个类别。所以你正在初始化一个不存在的类别。
您对缺失的处理可能不正确。如果有任何组成部分，您已将教育设置为缺失不见了。面试官可能正确地将其中一个组成变量编码为＆＃34; 1＆＃34;走了如果其他值已被编码，则其他值为空白（缺失）＆＃34; 0＆＃34;。不应该对这种观察进行教育失踪。

这是我对代码的看法：

set linesize 100
clear
input id educ_none educ_prim educ_secabove
1 0 1 0
2 1 0 0
3 0 0 1
4 . 1 .    /* Okay */
5 . . .    /* Really Missing */
6 0 0 0    /* Really Missing */
7 . 1 1    /* Illegal */
end

egen etot = rowtotal(educ_*) /* = 1 for valid values */
foreach x of varlist educ_* {
/* Tentatively fix incorrect missings */
    replace `x'= 0 if `x'==. & etot==1
    }
list
gen   education = 1 if educ_none==1
replace education=2 if educ_prim==1
replace education=3 if educ_secabove==1


/* Assign extended missing for illegal values*/
replace education = .a if etot >1 & etot<.
#delim ;
label define educl
    1 "None"
    2 "Primary"
    3 "Secondary+"
    .a  ">1 indicator is 1"
 ;
#delim cr
label values education educl
list
tab education, missing

Answer 2

自动化方法2014-06-02

在声明创建和标记新变量的过程无法实现自动化之后，我决定尝试一下。我在SSC上发现了两个有用的命令：Roger Newson的 varlabdef 和Daniel Klein的 labvalch3 。两者都可以从Stata中下载，例如ssc install varlabdef。

我假设，如在原始示例中，每个0-1变量名称的形式为“root_suffix”，并且具有相同根的变量中的一个具有值1.目标是创建新变量对于每个根，其值对应于值为1的指示符变量（如果有）的顺序。用户首先创建一个包含 all 根的本地宏。程序循环遍历根，每次传递中创建一个变量;内环实现尼克的解（B）; varlabdef 根据原始指标的名称创建值标签;并且 labvalch3 除去后缀之外的所有内容并将每个项目大写。然后使用label values语句将此值标签分配给新变量。在循环之外，新变量被赋予变量标签label variable。

在下面的示例中，有两个“根”，educ和gender。例如，具有根“性别”的变量是gender_male和gender_female。初始化新变量gender，然后为男性分配值1，为女性分配值2。定义了相应的值标签（也称为“性别”）并与新变量相关联，变量本身标记为“性别”。

 clear
input id educ_none educ_prim educ_secabove  gender_male gender_female
1 0 1 0  1 0
2 1 0 0  1 0
3 0 0 1  0 1
4 0 1 0  1 0
end

/* Create local macro to hold root names */
local roots educ gender

/* Loop over each root */
foreach v of local roots {
   qui gen `v' = 0  /* Initialize new variable from root */

    /* Get number of component variables */
   qui ds `v'_*
   local wc : word count `r(varlist)'

   /* Create new variables */
   forvalues k = 1/`wc' {
      /* z`k' is the k-th component variable */
      local z`k' : word `k' of `r(varlist)'  /* extended macro */
      qui replace `v' = `v'+`k'*`z`k''
      }
   /* Total components to check for missing/illegal values*/
   egen `v'tot = rowtotal(`v'_*)
   replace `v' = . if `v'tot != 1
   replace `v' = .a if `v'tot>1 & `v'tot<.
   /* Create value labels from variable names. Note that
      value labels can have same names as the
      the variables they label*/

   /* Create a value label consisting of the component variable names */
   varlabdef `v', vlist(`v'_*) from(name)
   label define `v' .a "Illegal", add

   /* Remove the roots from the labels and capitalize */
  labvalch3 `v', subst("`v'_" "")
  labvalch3 `v', strfcn(proper("@"))
  /* Assign the value labels to the new variables */
   label values `v' `v'
}
/* Give nice labels to the new variables */
label var educ "Education"
label var gender "Gender"

label list
tab educ
tab gender

结果是：

. label list
gender:
           1 Male
           2 Female
          .a Illegal
educ:
           1 None
           2 Prim
           3 Secabove
          .a Illegal

. tab educ

  Education |      Freq.     Percent        Cum.
------------+-----------------------------------
       None |          1       25.00       25.00
       Prim |          2       50.00       75.00
   Secabove |          1       25.00      100.00
------------+-----------------------------------
      Total |          4      100.00

.  tab gender

     Gender |      Freq.     Percent        Cum.
------------+-----------------------------------
       Male |          3       75.00       75.00
     Female |          1       25.00      100.00
------------+-----------------------------------
      Total |          4      100.00

Answer 3

除了Steve Samuels＆＃39;优秀的建议，在这个领域的三个标准设备是

一个。使用recode。看看它的帮助。

B中。

gen education = educ_none + 2 * educ_prim + 3 * educ_secabove

（当且仅当最多一个指标为1时才有效）

℃。

gen education = cond(educ_secabove == 1, 3, 
                cond(educ_prim == 1, 2, 
                cond(educ_none == 1, 1)))

注意：

C1。上面的代码是一个声明。布局只是为了使结构可见。

C2。正如在初等代数中一样，每个左括号(都暗示了一个通过右括号)匹配它的承诺。将呼叫嵌套到cond()并不会改变这种情况。

C3。 cond()位于http://www.stata-journal.com/sjpdf.html?articlenum=pr0016

上的内容更多

将变量的值替换为其他变量Stata 13的值

3 个答案: