read.fwf(),列名和R中的两种sep

时间:2014-04-30 07:44:21

标签: r

我与This issue

有类似的问题

我的数据就像(我的txt文件中没有第一行)

----*----1----*----2----*---
Region                 Value
New York, NY        66,834.6
Kings, NY           34,722.9
Bronx, NY           31,729.8
Queens, NY          20,453.0
San Francisco, CA   16,526.2
Hudson, NJ          12,956.9
Suffolk, MA         11,691.6
Philadelphia, PA    11,241.1
Washington, DC       9,378.0
Alexandria IC, VA    8,552.2

我的尝试是

#fwf data2
path <- "fwfdata2.txt"
data6 <- read.fwf(path, 
            widths=c(17, -3, 8), 
            header=TRUE,
            #sep=""
            as.is=FALSE)
data6

回答

> data6
                  Region.................Value
New York, NY                          66,834.6
Kings, NY                             34,722.9
Bronx, NY                             31,729.8
Queens, NY                            20,453.0
San Francisco, CA                     16,526.2
Hudson, NJ                            12,956.9
Suffolk, MA                           11,691.6
Philadelphia, PA                      11,241.1
Washington, DC                         9,378.0
Alexandria IC, VA                      8,552.2
> dim(data6)
[1] 10  1

问题在于,我的数据用“,”和“”分隔。当我添加sep =“”时,它将产生如下错误。

Error in read.table(file = FILE, header = header, sep = sep, row.names = row.names,  : 
  more columns than column names

1 个答案:

答案 0 :(得分:2)

我认为您的问题是read.fwf期望标题是隔离的,并且数据要固定宽度:

header: a logical value indicating whether the file contains the
        names of the variables as its first line.  If present, the
        names must be delimited by ‘sep’.

   sep: character; the separator used internally; should be a
        character that does not occur in the file (except in the
        header).

我跳过标题来读取数据,然后通过只读第一行来读取标题:

> data = read.fwf(path,widths=c(17,-3,8),head=FALSE,skip=1,as.is=TRUE)
> heads = read.fwf(path,widths=c(17,-3,8),head=FALSE,n=1,as.is=TRUE)
> names(data)=heads[1,]
> data
   Region               Value
1  New York, NY      66,834.6
2  Kings, NY         34,722.9
3  Bronx, NY         31,729.8
4  Queens, NY        20,453.0
5  San Francisco, CA 16,526.2
6  Hudson, NJ        12,956.9
7  Suffolk, MA       11,691.6
8  Philadelphia, PA  11,241.1
9  Washington, DC     9,378.0
10 Alexandria IC, VA  8,552.2

如果您希望将Region作为一个因素,那么在阅读数据时请使用as.is=FALSE(如您的示例所示),但在阅读标题时必须使用as.is=TRUE否则会转换为{{1}}数字。

您是否还想将区域拆分为以逗号分隔的部分,并将逗号分隔的数字转换为数字值?你没有说。