Question

我正在与国家教育统计中心的一个名为IPEDS的教育数据集合作。他们根据专业，学位完成等方式跟踪大学生.Stata的问题在于我试图确定特定专业获得的学位总数。

他们有一个变量cipcode，其中包含充当“主要”的值。 cipcode可能是14.2501“石油工程，16.0102”语言学等等。

当我写一个像

这样的特定代码时

tab cipcode if cipcode==14.2501

报告no observations。什么代码会给我总数？

/*Convert Float Variable to String Variable and use Force Replace*/
tostring cipcode, gen(cipcode_str) format(%6.4f) force
replace cipcode_str = reverse(substr(reverse(cipcode_str), indexnot(reverse(cipcode_str), "0"), .))
replace cipcode_str = reverse(substr(reverse(cipcode_str), indexnot(reverse(cipcode_str), "."), .))

/* Created a total variable called total_t1 for total count of all stem majors listed in table 1*/
gen total_t1 = cipcode_str== "14.2501" + "14.3901" + "15.0999" + "40.0601"

Answer 1

这个最小的例子证实了你的问题。（顺便说一下，https://stackoverflow.com/help/mcve请参阅关于好例子的建议。）

* code 
clear
input code 
14.2501 
14.2501 
14.2501 
end 

tab code if code == 14.2501
tab code if code == float(14.2501)

* results 
. tab code if code == 14.2501
no observations

. tab code if code == float(14.2501)

       code |      Freq.     Percent        Cum.
------------+-----------------------------------
    14.2501 |          3      100.00      100.00
------------+-----------------------------------
      Total |          3      100.00

关键字是您使用的关键字，精确度。在Stata中，search precision获取资源，首先是William Gould的博客文章。像14.2501这样的十进制很难（不可能）完全保持二进制状态，而保持变量类型float的细节可以咬人。

您很难看到您在最后一段代码中所做的事情，您无法解释。最后一句话看起来令人费解，因为你正在添加字符串。考虑一下

会发生什么

. gen whatever =  "14.2501" + "14.3901" + "15.0999" + "40.0601"

. di whatever[1]
14.250114.390115.099940.0601

结果是一个长字符串，不能是有效的cipcode。我怀疑你正朝着

迈进

 ... if inlist(cipcode_str, "14.2501", "14.3901", "15.0999", "40.0601")

这是完全不同的。

但是使用float()是解决此问题的最小技巧。

精确和重要

1 个答案: