将变量名称与其在不同数据库上的相应值匹配

时间:2014-04-01 00:15:47

标签: stata

我尝试使用宏从外部数据集导入变量名称,将这些名称与主文件中的相应值匹配,然后使用esttab导出循环主成分分析的结果。

我的代码看起来像这样。

preserve

forvalue file = 537(3)647 {

    import excel "C:\Users\M\Dropbox\Masterarbeit\Stata12\test/`file'.xls", sheet("Sheet1") firstrow clear

    local x ""
    foreach var of varlist *SA {
        local x `x' `var'
    }

    clear
    restore

    forvalue z = 537(3)647 {
        pca `x' if rMonth < `z'+3, comp(1)
        esttab e(L) using pc`z'.csv, replace
    }
}

该命令应循环通过第一个循环中定义的文件,捕获文件中的变量名,将这些变量与主文件中的相应值匹配(变量名称相同)然后执行pca。之后,它应该在下一个excel文件中创建一个新的变量名列表,并在pca中使用这些变量。在此状态下,只有值也在外部数据集中时,代码才有效。

问题是,我找不到一种方法来匹配外部文件中的变量名与主文件中的变量名,我只得到错误&#34;没有定义变量&#34;因为外部文件只包含变量的名称,而不是值。

我有什么建议告诉Stata它应该从外部文件中查找变量名称并将它们的值用于pca吗?

编辑:在保存之前,我的代码生成变量,在因变量上对它们进行回归,根据t值对它们进行排序,并将它们导出到我用来获取varlist的文件中。代码如下所示:

. set excelxlsxlargefile on

cd C:\Users\M\Dropbox\Masterarbeit\Stata12\sentiment_6m

. import excel "C:\Users\M\Dropbox\Masterarbeit\Daten\Dataimport\sentiments\Google Query CDX.xlsx", sheet("Tabelle1") firstrow

set more off

gen Month = month( Date)

gen     January     =   1   if  Month   ==  1
gen     February    =   1   if  Month   ==  2
gen     March   =   1   if  Month   ==  3
gen     April   =   1   if  Month   ==  4
gen     May =   1   if  Month   ==  5
gen     June    =   1   if  Month   ==  6
gen     July    =   1   if  Month   ==  7
gen     August  =   1   if  Month   ==  8
gen     September   =   1   if  Month   ==  9
gen     October =   1   if  Month   ==  10
gen     November    =   1   if  Month   ==  11
gen     December    =   1   if  Month   ==  12
replace     January     =   0   if  January     ==  .
replace     February    =   0   if  February    ==  .
replace     March   =   0   if  March   ==  .
replace     April   =   0   if  April   ==  .
replace     May =   0   if  May ==  .
replace     June    =   0   if  June    ==  .
replace     July    =   0   if  July    ==  .
replace     August  =   0   if  August  ==  .
replace     September   =   0   if  September   ==  .
replace     October =   0   if  October ==  .
replace     November    =   0   if  November    ==  .
replace     December    =   0   if  December    ==  .


foreach var of varlist *_qry{  
sum `var', meanonly
local mu =r(mean)
reg `var' January  February March April May June July August September October November December, nocons
predict double `var'SA, residual
replace `var'SA=`var'SA+`mu'
egen sd = sd(`var'SA)
replace `var'SA=`var'SA/sd
drop sd
drop `var'
}



* BIG LOOP *

generate double rMonth = mofd( Date)
global tflist ""

forvalue y = 537(3)647{


foreach var of varlist *SA{
reg MidCDX `var' if rMonth<=`y'
tempfile tfcur
parmest, idstr("`var'") saving(`"`tfcur'"', replace) flis(tflist) 
}


* Concatenate files into memory (REPLACING THE OLD DATA) *
preserve
clear
append using $tflist
sencode idstr, gene(xvar)
lab var xvar "X-variable"
keybygen xvar, gene(parmseq)
drop if parm=="_cons"
egen rank = rank (-t)
gsort -t
drop if rank>40
save `y', replace
export excel xvar t using `y', firstrow(variables) replace
foreach TF in $tflist {
erase `"`TF'"'
}
global tflist ""
restore

}

3 个答案:

答案 0 :(得分:2)

也许这个例子有帮助:

clear all
set more off

/*
load two example MS Excel files with var names only and accumulate var names in a local.
files are named varfile.xls and varfile2.xls
*/

foreach i in "" "2" {

    import excel "/home/roberto/Desktop/stata_tests/varfile`i'.xls", firstrow clear

    * get var names
    quietly ds

    * save var names in local
    local myvars `myvars' `r(varlist)'
}

* load database that contains vars and values
sysuse auto, clear

* do pca
pca `myvars'

/*
varfile.xls contains variables "weight" and "price"
varfile2.xls contains variables "mpg" and "length"
*/

ds可以解决这个问题,因为它保存了在MS Excel表中选取的变量的名称,并将结果存储在r(varlist)中。请参阅help dshelp saved results(或help stored results)。然后,我们加载一个&#34;完成&#34;数据库并使用存储的变量名称pca

MS Excel文件如下所示:

enter image description here

我认为,这回答了你提出的具体问题。

修改

仔细查看代码,我不确定问题是否与完整数据库中的变量名称匹配有关,而是设置preserverestore的方式存在问题。不要使用那组命令,只需在需要时加载完整的数据库(使用use)。

preserve之前你有什么?你的错误出现在哪里?请发布更多代码。一个可重复的例子会有所帮助。

编辑2

我的猜想现在是preserve之前你没有 ,所以当你restore时,你只是设置了清单;您正在还原一个空白数据库。因此,尝试pca <somevar>会给您:

no variables defined
r(111);

preserve会保留数据,因为在发出命令之前它只是

答案 1 :(得分:2)

个人评论:这里的代码太多了,我想尝试吸收你想要做的事情。我只评论一些技术细节。

这段代码

gen January = 1 if Month == 1 gen February = 1 if Month == 2 gen March = 1 if Month == 3 gen April = 1 if Month == 4 gen May = 1 if Month == 5 gen June = 1 if Month == 6 gen July = 1 if Month == 7 gen August = 1 if Month == 8 gen September = 1 if Month == 9 gen October = 1 if Month == 10 gen November = 1 if Month == 11 gen December = 1 if Month == 12 replace January = 0 if January == . replace February = 0 if February == . replace March = 0 if March == . replace April = 0 if April == . replace May = 0 if May == . replace June = 0 if June == . replace July = 0 if July == . replace August = 0 if August == . replace September = 0 if September == . replace October = 0 if October == . replace November = 0 if November == . replace December = 0 if December == .

可以像这样重写

tokenize "`c(Months)'"
forval j = 1/12 { 
    gen ``j'' = Month == `j' 
}

1月到12月的月份名称已连接到c(Months)

sum `var', meanonly
local mu =r(mean)
reg `var' January  February March April May June July August September October November December, nocons
predict double `var'SA, residual
replace `var'SA=`var'SA+`mu'
egen sd = sd(`var'SA)
replace `var'SA=`var'SA/sd
drop sd

可以缩短为

reg `var' January-December, nocons
predict double `var'SA, residual
sum `var' 
replace `var'SA = (`var'SA + r(mean)) / r(sd) 

请注意,创建仅包含SD的整个变量并不是一个好主意。这取消了使用summarize, meanonly节省的任何时间。

我不会在这里评论你在统计上做什么,添加平均值然后除以SD。

答案 2 :(得分:1)

@Roberto Ferrer正在解决您的主要问题,这取决于比较文件中的变量名称。我添加了有关使用本地宏和通配符语法的详细信息。

local x ""
foreach var of varlist *SA {
    local x `x' `var'
}

是一个很长的路要走

unab x : *SA