Question

我正在学习R并且只是尝试读取stata数据文件，但我收到以下错误：

X＆lt; - Stata.file（Stata_File）

nchar（varlabs）出错：无效的多字节字符串253

此处有多个Mac用户遇到此程序错误，但在PC上运行正常。谷歌搜索这个错误似乎说它与R包有关但我找不到解决方案。有任何想法吗？谢谢你的帮助!!

直到错误点的R代码如下：

Root   <- "/Users/Desktop/R_Training"
PathIn <- paste(Root,"Data/Example_0",sep="/")

# The 2007 Dominican Republic household member file (96 MB) 
Stata_File <- "drpr51fl.dta"

# Load the memisc package:
library(memisc)

# Set the working directory:
setwd(PathIn)

# (1) Determine which variables we want:
# The Stata.file function (from memisc) reads the "header" 
#  of our Stata file so you can see what it contains
#  and choose the variables you want.
X <- Stata.file(Stata_File)

**Error in nchar(varlabs) : invalid multibyte string 253**

以下是我的会话信息：
R版本2.13.1（2011-07-08）平台：x86_64-apple-darwin9.8.0 / x86_64（64位）

区域设置： [1] en_US.UTF-8 / en_US.UTF-8 / C / C / en_US.UTF-8 / en_US.UTF-8

附加基础包： [1]网格统计图形grDevices utils数据集 [7]方法基础

其他附件包： [1] memisc_0.95-33 MASS_7.3-13 lattice_0.19-30

Answer 1

这对我有用。您可以通过发出以下命令强制R识别每个字符：

Sys.setlocale（ 'LC_ALL'， 'C'）

现在运行上一个命令，一切都应该没问题。

Answer 2

似乎文件中的字符串编码不是程序认为的... 我猜这个文件是在PC上生成的？它是否包含非ACII列名称或数据字符串？

由于你似乎有UTF-8编码，而且（美国/西欧）PC：s通常有latin-1，这可能是问题所在。我希望Linux上也存在相同的问题（也是UTF-8）。

可能的解决方法： Stata.file方法是否有“编码”选项？然后你可以试试'latin1'并希望最好......

另一种可能性是使用--encoding = latin1选项启动R.

在Mac上的R中读取stata文件时nchar（）出错

2 个答案: