我的问题是egen sum()
和egen total()
都不能正确求和。
我试图求和的变量eqvalueusd
是从.csv文件中导入的,其名称为str20
;然后我encode
将其放入名为marketusd
的新变量中,该变量的格式为long
(%16.0g
)。
我想要该变量中的所有值。
到目前为止我尝试了什么(没有用)
1:summarize marketusd, meanonly
和display r(sum)
,显示错误的总和
2:egen sum = sum(marketusd)
和egen sum = total(marketusd)
,它们在新变量中输入了错误的总和
3:egen double sum = sum(marketusd)
,egen double sum = total(marketusd)
,egen float sum = sum(marketusd)
和egen float sum = total(marketusd)
,将错误的总和输入新变量
4:我还结合了前者,而不是用encode
而不是generate newvar = real(eqvalueusd)
转换了原始变量,后者用丢失的点“。”填充了newvar
;并带有destring eqvalueusd, replace
,它返回以下错误消息contains nonnumeric characters
(这也很奇怪,因为eqvalueusd
仅包含数字字符)。
5:当我复制marketusd
数据并在Excel中计算总和时,我获得了正确的数字。因此,无论出什么问题,这都在我的Stata代码中。
* Example generated by -dataex-. To install: ssc install dataex
clear
input str20 eqvalueusd long marketusd float newvar
"4.606.727,95" 424 192510
"1.132.456,29" 27 192510
"5.596.517,16" 472 192510
"3.292.918,01" 339 192510
"6.748.732,71" 512 192510
"6.139.518,59" 500 192510
"2.927.484,83" 272 192510
"11.474.461,50" 168 192510
"1.253.316,28" 54 192510
"1.717.925,38" 130 192510
"1.336.232,93" 73 192510
"4.863.581,14" 433 192510
"4.370.258,61" 412 192510
"1.526.748,61" 105 192510
"20.280.664,62" 276 192510
"5.643.416,65" 476 192510
"2.098.390,40" 228 192510
"2.853.095,83" 268 192510
"72.674,07" 549 192510
"224.362,66" 298 192510
"2.358.856,20" 238 192510
"0,37" 1 192510
"5.761.013,33" 479 192510
"1.421.174,00" 89 192510
"6.315.874,00" 503 192510
"1.458.139,03" 96 192510
"7.310.413,83" 535 192510
"2.203.177,49" 231 192510
"1.176.210,24" 38 192510
"1.252.117,44" 53 192510
"20.824.291,28" 284 192510
"3.338.046,79" 341 192510
"3.756.050,52" 361 192510
"5.676.796,74" 477 192510
"1.560.603,03" 114 192510
"534.372,36" 487 192510
"29.592.046,17" 323 192510
"4.281.136,11" 409 192510
"821.142,00" 578 192510
"2.535.309,35" 248 192510
"23.026.731,10" 301 192510
"49.629.060,26" 458 192510
"1.052.654,93" 11 192510
"1.001.017,50" 2 192510
"3.483.488,91" 349 192510
"370.816.160,01" 388 192510
"7.716.727,72" 542 192510
"3.432.478,63" 344 192510
"28.481.992,67" 318 192510
"369.580,98" 385 192510
"9.975.296,70" 599 192510
"6.136.398,05" 499 192510
"6.791.545,74" 514 192510
"8.349.073,42" 563 192510
"19.297.647,24" 219 192510
"2.900.280,82" 271 192510
"3.798,33" 363 192510
"4.129.903,95" 403 192510
"831.718,20" 579 192510
"18.559.520,16" 215 192510
"7.937.960,14" 544 192510
"14.267.003,27" 191 192510
"1.326.491,92" 69 192510
"13.011,32" 183 192510
"993.512,11" 620 192510
"4.772.173,35" 430 192510
"14.772,85" 194 192510
"5.204.176,80" 464 192510
"25.717.006,99" 312 192510
"2.346.906,70" 237 192510
"9.675.531,03" 596 192510
"3.557.999,40" 352 192510
"1.711.335,49" 129 192510
"5.324.698,44" 465 192510
"98.745.322,26" 615 192510
"5.421.793,96" 468 192510
"24.111.888,32" 309 192510
"20.720.051,22" 282 192510
"46.803.838,01" 453 192510
"20.820.859,94" 283 192510
"1.504.028,44" 102 192510
"2.301.295,57" 234 192510
"5.478.638,14" 471 192510
"6.062.898,51" 496 192510
"756.133,96" 554 192510
"8.147.619,93" 561 192510
"50.793.535,72" 486 192510
"840.738,25" 581 192510
"1.363.147,24" 81 192510
"7.306.628,55" 534 192510
"74.690,62" 552 192510
"1.354.018,89" 76 192510
"1.141.966,42" 31 192510
"2.055.183,94" 224 192510
"7.980.821,15" 545 192510
"244.754,81" 310 192510
"1.458.217,93" 97 192510
"7.518.664,69" 539 192510
"1.875.695,95" 148 192510
"2.190.106,38" 230 192510
end
label values marketusd a
label def a 1 "0,37", modify
label def a 2 "1.001.017,50", modify
label def a 11 "1.052.654,93", modify
label def a 27 "1.132.456,29", modify
label def a 31 "1.141.966,42", modify
label def a 38 "1.176.210,24", modify
label def a 53 "1.252.117,44", modify
label def a 54 "1.253.316,28", modify
label def a 69 "1.326.491,92", modify
label def a 73 "1.336.232,93", modify
label def a 76 "1.354.018,89", modify
label def a 81 "1.363.147,24", modify
label def a 89 "1.421.174,00", modify
label def a 96 "1.458.139,03", modify
label def a 97 "1.458.217,93", modify
label def a 102 "1.504.028,44", modify
label def a 105 "1.526.748,61", modify
label def a 114 "1.560.603,03", modify
label def a 129 "1.711.335,49", modify
label def a 130 "1.717.925,38", modify
label def a 148 "1.875.695,95", modify
label def a 168 "11.474.461,50", modify
label def a 183 "13.011,32", modify
label def a 191 "14.267.003,27", modify
label def a 194 "14.772,85", modify
label def a 215 "18.559.520,16", modify
label def a 219 "19.297.647,24", modify
label def a 224 "2.055.183,94", modify
label def a 228 "2.098.390,40", modify
label def a 230 "2.190.106,38", modify
label def a 231 "2.203.177,49", modify
label def a 234 "2.301.295,57", modify
label def a 237 "2.346.906,70", modify
label def a 238 "2.358.856,20", modify
label def a 248 "2.535.309,35", modify
label def a 268 "2.853.095,83", modify
label def a 271 "2.900.280,82", modify
label def a 272 "2.927.484,83", modify
label def a 276 "20.280.664,62", modify
label def a 282 "20.720.051,22", modify
label def a 283 "20.820.859,94", modify
label def a 284 "20.824.291,28", modify
label def a 298 "224.362,66", modify
label def a 301 "23.026.731,10", modify
label def a 309 "24.111.888,32", modify
label def a 310 "244.754,81", modify
label def a 312 "25.717.006,99", modify
label def a 318 "28.481.992,67", modify
label def a 323 "29.592.046,17", modify
label def a 339 "3.292.918,01", modify
label def a 341 "3.338.046,79", modify
label def a 344 "3.432.478,63", modify
label def a 349 "3.483.488,91", modify
label def a 352 "3.557.999,40", modify
label def a 361 "3.756.050,52", modify
label def a 363 "3.798,33", modify
label def a 385 "369.580,98", modify
label def a 388 "370.816.160,01", modify
label def a 403 "4.129.903,95", modify
label def a 409 "4.281.136,11", modify
label def a 412 "4.370.258,61", modify
label def a 424 "4.606.727,95", modify
label def a 430 "4.772.173,35", modify
label def a 433 "4.863.581,14", modify
label def a 453 "46.803.838,01", modify
label def a 458 "49.629.060,26", modify
label def a 464 "5.204.176,80", modify
label def a 465 "5.324.698,44", modify
label def a 468 "5.421.793,96", modify
label def a 471 "5.478.638,14", modify
label def a 472 "5.596.517,16", modify
label def a 476 "5.643.416,65", modify
label def a 477 "5.676.796,74", modify
label def a 479 "5.761.013,33", modify
label def a 486 "50.793.535,72", modify
label def a 487 "534.372,36", modify
label def a 496 "6.062.898,51", modify
label def a 499 "6.136.398,05", modify
label def a 500 "6.139.518,59", modify
label def a 503 "6.315.874,00", modify
label def a 512 "6.748.732,71", modify
label def a 514 "6.791.545,74", modify
label def a 534 "7.306.628,55", modify
label def a 535 "7.310.413,83", modify
label def a 539 "7.518.664,69", modify
label def a 542 "7.716.727,72", modify
label def a 544 "7.937.960,14", modify
label def a 545 "7.980.821,15", modify
label def a 549 "72.674,07", modify
label def a 552 "74.690,62", modify
label def a 554 "756.133,96", modify
label def a 561 "8.147.619,93", modify
label def a 563 "8.349.073,42", modify
label def a 578 "821.142,00", modify
label def a 579 "831.718,20", modify
label def a 581 "840.738,25", modify
label def a 596 "9.675.531,03", modify
label def a 599 "9.975.296,70", modify
label def a 615 "98.745.322,26", modify
label def a 620 "993.512,11", modify
答案 0 :(得分:3)
主要问题是encode
对于这种字符串变量来说是完全错误的。
显然,eqvalueusd
保留数字信息,但是句点.
被用作分隔符,逗号,
被用作小数点。
当您使用encode
时,没有其他说明,字符串按字母数字顺序映射为整数1到上,并且每个字符串本身都成为一个值标签。 dataex
示例说明了这是如何产生废话的。生成的整数甚至不一定按正确的顺序排列,如以下示例所示:
label def a 219 "19.297.647,24", modify
label def a 224 "2.055.183,94", modify
"2.055.183,94"
(虽然显然是200万左右)在"19.297.647,24"
之后(显然是1900万左右)排序,因为排序是按字母数字或字典顺序进行的。不可避免地得出,通过将这些整数相加得出的总和也是无意义的。
将此类字符串转换为数字变量的方法是使用destring
,但是在这种情况下,通常需要研究帮助并使用适当的选项。
destring eqvalueusd , dpcomma ignore(.) gen(wanted)
表示所需解决方案的种类。
简而言之,当将encode
和"male"
之类的字符串映射到1和2或需要其他任何整数时,"female"
用于分类变量。 0和1是二进制分类变量的绝佳选择。
要查看无理的字符串变量,请see here-或阅读destring
和encode
的帮助,并仔细观察结果是否是您想要的。请注意,generate
newvar = real(
oldvar )
仅当没有非数字时才是一个好的解决方案字符(否则将不会指向destring
!)(精度也不是问题,正如现在要讨论的那样)。
经常出现第二个精度问题。任何新变量的存储类型是否足以容纳新值而不会产生不精确的影响?对于包含货币价值的变量而言,问题尤其严重,这些变量不仅包含大量货币,而且包含诸如美分和美元之类的细节。可以理解的是,用户希望像总量这样的数量可以完全重现。为此,通常建议坚持使用double
存储类型的Stata。如果总数为整数,则long
存储类型通常可以正常工作。
另请参阅this thread,以了解为什么encode
对于日期而言是一个非常糟糕的主意。 (在先前的参考资料中也讨论了该问题。)
注意。您引用了egen
的{{1}}函数和sum()
的{{1}}函数,但是它们是相同的。如果您
egen
您将看到该函数只是total()
的包装。那么,那里发生了什么?在Stata 9之前,使用的名称为viewsource _gsum.ado
,但已经意识到该名称与函数_gtotal.ado
的名称太接近,该功能可以与sum()
一起使用并产生累积或运行总和,与sum()
的{{1}}函数不同,该函数对于一个观察块产生一个唯一的值,即馈入该值的总和。
关键事实:任何generate
函数都是由egen
文件定义的,规则是sum()
函数 foo egen
由.ado
foo egen
。
因此()
的{{1}}被重命名为_g
,并且没有记录。因此,最好使用.ado
,尽管egen
可以继续工作并且在代码中经常出现,因为许多Stata程序员在Stata 9之前就已经开始工作,许多其他人已经在代码中看到并复制了它们。它。