Question

我已使用 readstata13 包将Stata dta文件导入R。

变量具有包含完整问题长度的注释。我发现了attr()函数，通过它可以执行一些操作，例如提取变量名（attr(df, name)），提取变量标签（attr(df, "var")）和标签值（attr(df, "label") ）。但是，我还没有找到提取可变注释的方法。

有办法吗？

下面是几行Stata代码，它们生成带有两个变量和变量注释的dta文件，可以将其导入R。

clear
input int(mpg weight)
34 1800
18 3670
21 4060
15 3720
19 3400
41 2040
25 1990
28 3260
30 1980
12 4720
end
note mpg: Mileage (mpg)
note weight: Weight (lbs.)
save "~/mpg_weight.dta", replace

Answer 1

编辑：

实际上，您可以直接在readstata13()的较新版本中执行此操作，如下所示：

df = read.dta13("~/mpg_weight.dta")
notes = attr(df, "expansion.fields")

这将生成一个列表，其中提供了变量名称，特征名称和Stata特征字段的内容。

以下是使用玩具示例的快速解决方法：

clear

input int(mpg weight)
34 1800
18 3670
21 4060
15 3720
19 3400
41 2040
25 1990
28 3260
30 1980
12 4720
end

note mpg: this is the first note
note mpg: and this is the second
note mpg: here's a third
note weight: Weight (lbs.)
save "~/mpg_weight.dta", replace

ds
local varlist `r(varlist)'

foreach var of local varlist {
    generate notes_`var' = ""
    forvalues i = 1 / ``var'[note0]' {
        replace notes_`var' = "``var'[note`i']'" in `i'
    }
}

export delimited notes_* using notes_mpg_weight.dta.csv, replace

然后您可以简单地将R中的所有内容导入为字符串，然后从那里开始。

在R

1 个答案: