我遇到包含标点符号的列名时遇到麻烦。 我诊断出这个问题如下:
file <- "./spam.data.txt"
columnNames <- c('word_freq_make',
'word_freq_address',
...
'word_freq_table',
'word_freq_conference',
'char_freq_;',
'char_freq_(',
'char_freq_[',
'char_freq_!',
'char_freq_$',
'char_freq_#',
'capital_run_length_average',
...)
spamd <- read.table(file, sep = "" , header = F, stringsAsFactors= F,
col.names = columnNames)
# First look
spamd$word_freq_85 # [1] 0 0 0 0 0 0 0 0 1 0 1 ...
spamd$char_freq_; # NULL
colnames(spamd)
colnames()
的输出是:
[1] "word_freq_make" "word_freq_address" ...
[46] "word_freq_table" "word_freq_conference" "char_freq_." "char_freq_..1"
[51] "char_freq_..2" "char_freq_..3" "char_freq_..4" "char_freq_..5" "capital_run_length_average"
即,列名称中的标点符号已被&#34; ... 1&#34;,&#34; ... 2&#34;,&#34; ... 3&#34;,...替换。 ..
为什么这样呢?
关于AKRUN的回复编辑:
拥有:
spamd <- read.table(file, sep = "" , header = F, stringsAsFactors= F,
col.names = columnNames, check.names = FALSE)
代替解决重命名问题。即,colnames()
现在产生:
[41] "word_freq_cs" "word_freq_meeting" "word_freq_original" "word_freq_project" "word_freq_re"
[46] "word_freq_edu" "word_freq_table" "word_freq_conference" "char_freq_;" "char_freq_("
[51] "char_freq_[" "char_freq_!" "char_freq_$"
但如果我尝试spamd$char_freq_X
,其中X是任何标点符号,我仍然会得到NULL
。那么,如何访问这些列?
由于
答案 0 :(得分:1)
我们需要使用check.names=FALSE
spamd <- read.table(file, sep = "" , header = F, stringsAsFactors= F,
col.names = columnNames, check.names = FALSE)