Question

我正在尝试将CSV文件中的数据读入数据框。数据包含我不想要的名称作为因素。我不能使用stringAsFactors=FALSE参数，因为我想要其他列作为因素。

我如何达到预期的行为？

注意：数据有数千列......我只需要为一列修改数据类型。默认情况下为其余列分配的类型都很好

Answer 1

使用colClasses参数指定每列的类型。例如：

x <- read.csv("myfile.csv", colClasses=c("numeric","factor","character"))

Answer 2

您可以指定列类。来自?read.table

colClasses: character.  A vector of classes to be assumed for the
      columns.  Recycled as necessary, or if the character vector
      is named, unspecified values are taken to be 'NA'.

      Possible values are 'NA' (the default, when 'type.convert' is
      used), '"NULL"' (when the column is skipped), one of the
      atomic vector classes (logical, integer, numeric, complex,
      character, raw), or '"factor"', '"Date"' or '"POSIXct"'.
      Otherwise there needs to be an 'as' method (from package
      'methods') for conversion from '"character"' to the specified
      formal class.

      Note that 'colClasses' is specified per column (not per
      variable) and so includes the column of row names (if any).

类似于：

types = c("numeric", "character", "factor")
read.table("file.txt", colClasses = types)

应该这样做。

就个人而言，我只会以字符串或因子读取列，然后更改所需的列。

Answer 3

正如a previous answer中的文档所述，如果在读入数据之前知道列的名称，则可以使用命名字符向量来指定该列。

types <- c(b="character") #Set the column named "b" to character
df <- read.table(header=TRUE,sep=",",colClasses=types,text="
a,b,c,d,e
1,asdf,morning,4,greeting
5,fiewhn,evening,12,greeting
9,ddddd,afternoon,292,farewell
33,eianzpod,evening,1111,farewell
191,dnmxzcv,afternoon,394,greeting
")
sapply(df,class)
#          a           b           c           d           e 
#  "integer" "character"    "factor"   "integer"    "factor"

如果没有标题，您也可以按位置执行：

types <- c(V2="character") #Set the second column to character
df <- read.table(header=FALSE,sep=",",colClasses=types,text="
1,asdf,morning,4,greeting
5,fiewhn,evening,12,greeting
9,ddddd,afternoon,292,farewell
33,eianzpod,evening,1111,farewell
191,dnmxzcv,afternoon,394,greeting
")
sapply(df,class)
#       V1          V2          V3          V4          V5 
#"integer" "character"    "factor"   "integer"    "factor"

最后，如果您知道位置但有标题，则可以构建适当长度的向量。对于colClasses，NA表示默认。

types <- rep.int(NA_character_,5) #make this length the number of columns
types[2] <- "character" #force the second column as character
df <- read.table(header=TRUE,sep=",",colClasses=types,text="
a,b,c,d,e
1,asdf,morning,4,greeting
5,fiewhn,evening,12,greeting
9,ddddd,afternoon,292,farewell
33,eianzpod,evening,1111,farewell
191,dnmxzcv,afternoon,394,greeting
")
sapply(df,class)
#       V1          V2          V3          V4          V5 
#"integer" "character"    "factor"   "integer"    "factor"

将数据读入数据框

3 个答案: