使用类函数在read.table中指定colClasses

时间:2015-06-26 16:41:17

标签: r read.table

有没有办法使用read.table()读取全部或部分文件,使用类函数获取列类型,修改列类型,然后重新读取文件?

基本上我的列是零填充整数,我喜欢把它当作字符串。如果我让read.table()只是做它的事情,它当然假设这些是数字并剥离前导零并使列类型为整数。事情是我有相当数量的列,所以虽然我可以创建一个指定每个列的字符向量,但我只想改变R&#39的最佳猜测。我想要做的是阅读前几行:

myTable <- read.table("//myFile.txt", sep="\t", quote="\"", header=TRUE, stringsAsFactors=FALSE, nrows = 5)

然后获取列类:

colTypes <- sapply(myTable, class)

更改几种列类型,即:

colTypes[1] <- "character"

然后使用修改后的列类型重新读取文件:

myTable <- read.table("//myFile.txt", sep="\t", quote="\"", colClasses=colTypes, header=TRUE, stringsAsFactors=FALSE, nrows = 5)

虽然这似乎是一个无限合理的事情,colTypes = c("character")工作正常,当我真正尝试它时,我得到了:

scan() expected 'an integer', got '"000001"'

class(colTypes)class(c("character"))都返回"character"那么问题是什么?

1 个答案:

答案 0 :(得分:0)

使用read.table s colClasses =参数指定要归类为character的列。例如:

txt <- 
"var1, var2, var3
 0001, 0002, 1
 0003, 0004, 2"
df <- 
read.table(
    text = txt,
    sep = ",",
    header = TRUE,
    colClasses = "character") ## read all as characters
df    
df2 <- 
read.table(
    text = txt, 
    sep = ",",
    header = TRUE,
    colClasses = c("character", "character", "double")) ## the third column is numeric 
df2

[更新...]或者,您可以使用向量设置和重新设置colClasses ...

df <- 
read.table(
    text = txt, 
    sep = ",",
    header = TRUE)
df

## they're all currently read as integer
myColClasses <-   
sapply(df, class)

## create a vector of column names for zero padded variable
zero_padded <-    
c("var1", "var2")

## if a name is in zero_padded, return "character", else leave it be
myColClasses <-   
ifelse(names(myColClasses) %in% zero_padded, 
       "character", 
       myColClasses)

## read in with colClasses set to myColClasses
df2 <- 
read.table(
    text = txt, 
    sep = ",",
    colClasses = myColClasses,
    header = TRUE)
df2