有没有办法使用read.table()
读取全部或部分文件,使用类函数获取列类型,修改列类型,然后重新读取文件?
基本上我的列是零填充整数,我喜欢把它当作字符串。如果我让read.table()
只是做它的事情,它当然假设这些是数字并剥离前导零并使列类型为整数。事情是我有相当数量的列,所以虽然我可以创建一个指定每个列的字符向量,但我只想改变R&#39的最佳猜测。我想要做的是阅读前几行:
myTable <- read.table("//myFile.txt", sep="\t", quote="\"", header=TRUE, stringsAsFactors=FALSE, nrows = 5)
然后获取列类:
colTypes <- sapply(myTable, class)
更改几种列类型,即:
colTypes[1] <- "character"
然后使用修改后的列类型重新读取文件:
myTable <- read.table("//myFile.txt", sep="\t", quote="\"", colClasses=colTypes, header=TRUE, stringsAsFactors=FALSE, nrows = 5)
虽然这似乎是一个无限合理的事情,colTypes = c("character")
工作正常,当我真正尝试它时,我得到了:
scan() expected 'an integer', got '"000001"'
class(colTypes)
和class(c("character"))
都返回"character"
那么问题是什么?
答案 0 :(得分:0)
使用read.table
s colClasses =
参数指定要归类为character
的列。例如:
txt <-
"var1, var2, var3
0001, 0002, 1
0003, 0004, 2"
df <-
read.table(
text = txt,
sep = ",",
header = TRUE,
colClasses = "character") ## read all as characters
df
df2 <-
read.table(
text = txt,
sep = ",",
header = TRUE,
colClasses = c("character", "character", "double")) ## the third column is numeric
df2
[更新...]或者,您可以使用向量设置和重新设置colClasses
...
df <-
read.table(
text = txt,
sep = ",",
header = TRUE)
df
## they're all currently read as integer
myColClasses <-
sapply(df, class)
## create a vector of column names for zero padded variable
zero_padded <-
c("var1", "var2")
## if a name is in zero_padded, return "character", else leave it be
myColClasses <-
ifelse(names(myColClasses) %in% zero_padded,
"character",
myColClasses)
## read in with colClasses set to myColClasses
df2 <-
read.table(
text = txt,
sep = ",",
colClasses = myColClasses,
header = TRUE)
df2