我注意到通过
读取大型csv文件时output <- read.table( ..., header = TRUE, sep = ",")
创建的数据框有一些空白列。这些列遵循命名模式
colnames(output)
"Factor.1" "Factor.2" "etc" "Stuff" "X" "X.1" "X.2" "X.3" "X.4" "X.5"
"X.6" "X.7" "X.8" "X.9" "X.10" "X.11" "X.12" "X.13"
"X.14" "X.15" "X.16" "X.17" "X.18" "X.19" "X.20" "X.21"
"X.22" "X.23" "X.24" "X.25" "X.26" "X.27" "X.28" "X.29"
"X.30" "X.31" "X.32" "X.33"
我注意到它在?read.table
中声明了
col.names:变量的可选名称向量。默认 是使用&#34; V &#34;然后是列号。
为什么我用X代替V?
编辑: 这就是csv文件的样子
Date,Duration,Count,Factor 1,Factor 2,Factor 3,Hour,Day,Month,Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 0:00,9.99,10,GC,LS,FT,0,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 1:00,9.63125,8,GC,LS,FT,1,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 2:00,7.388888889,3,GC,LS,FT,2,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 3:00,7.087037037,9,GC,LS,FT,3,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
...
答案 0 :(得分:5)
以下是read.table()
if (header) {
.External(C_readtablehead, file, 1L, comment.char,
blank.lines.skip, quote, sep, skipNul)
if (missing(col.names))
col.names <- first
else if (length(first) != length(col.names))
warning("header and 'col.names' are of different lengths")
}
它if (missing(col.names)) col.names <- first
非常重要。从那里,我们可以返回并获得first
,为此情况定义为
first <- scan(textConnection(file), what = "", sep = ",",
nlines = 1, quiet = TRUE, skip = 0, strip.white = TRUE)
导致
# [1] "Date" "Duration" "Count" "Factor 1" "Factor 2" "Factor 3" "Hour" "Day" "Month"
# [10] "Year" "" "" "" "" "" "" "" ""
# [19] "" "" "" "" "" "" "" "" ""
# [28] "" "" "" "" "" "" "" "" ""
# [37] "" "" "" "" "" "" "" ""
稍后,make.names()
会调用col.names
,从而产生您的姓名
make.names(first, unique = TRUE)
# [1] "Date" "Duration" "Count" "Factor.1" "Factor.2" "Factor.3" "Hour" "Day" "Month"
# [10] "Year" "X" "X.1" "X.2" "X.3" "X.4" "X.5" "X.6" "X.7"
# [19] "X.8" "X.9" "X.10" "X.11" "X.12" "X.13" "X.14" "X.15" "X.16"
# [28] "X.17" "X.18" "X.19" "X.20" "X.21" "X.22" "X.23" "X.24" "X.25"
# [37] "X.26" "X.27" "X.28" "X.29" "X.30" "X.31" "X.32" "X.33"
我们在文档中提到X
而不是V
的原因是因为if(header)
之后的下一个条件是
else if (missing(col.names))
col.names <- paste0("V", 1L:cols)
但是我们从来没有使用过该语句,make.names()
默认连接到X
。除了这个解释之外,还有更多的东西。最好的办法是通过read.table
来源(它很复杂)。
数据:强>
file <- "Date,Duration,Count,Factor 1,Factor 2,Factor 3,Hour,Day,Month,Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 0:00,9.99,10,GC,LS,FT,0,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 1:00,9.63125,8,GC,LS,FT,1,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 2:00,7.388888889,3,GC,LS,FT,2,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1/1/2012 3:00,7.087037037,9,GC,LS,FT,3,7,1,2012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"