colnames中的标点符号由..替换

时间:2017-03-09 12:36:53

标签: r

我遇到包含标点符号的列名时遇到麻烦。 我诊断出这个问题如下:

file <- "./spam.data.txt"
columnNames <- c('word_freq_make',       
                 'word_freq_address',               
                 ...        
                 'word_freq_table',        
                 'word_freq_conference',   
                 'char_freq_;',            
                 'char_freq_(',            
                 'char_freq_[',            
                 'char_freq_!',            
                 'char_freq_$',            
                 'char_freq_#',            
                 'capital_run_length_average', 
                  ...)
spamd <- read.table(file, sep = "" , header = F, stringsAsFactors= F,
                    col.names = columnNames)

# First look
spamd$word_freq_85         # [1] 0 0 0 0 0 0 0 0 1 0 1 ...
spamd$char_freq_;          # NULL
colnames(spamd)  

colnames()的输出是:

 [1] "word_freq_make"             "word_freq_address"       ...           


[46]  "word_freq_table"            "word_freq_conference"       "char_freq_."                "char_freq_..1"             
[51] "char_freq_..2"              "char_freq_..3"              "char_freq_..4"              "char_freq_..5"              "capital_run_length_average"

即,列名称中的标点符号已被&#34; ... 1&#34;,&#34; ... 2&#34;,&#34; ... 3&#34;,...替换。 ..

为什么这样呢?

关于AKRUN的回复编辑:

拥有:

spamd <- read.table(file, sep = "" , header = F, stringsAsFactors= F,
                    col.names = columnNames, check.names = FALSE)

代替解决重命名问题。即,colnames()现在产生:

[41] "word_freq_cs"               "word_freq_meeting"          "word_freq_original"         "word_freq_project"          "word_freq_re"              
[46] "word_freq_edu"              "word_freq_table"            "word_freq_conference"       "char_freq_;"                "char_freq_("               
[51] "char_freq_["                "char_freq_!"                "char_freq_$"  

但如果我尝试spamd$char_freq_X,其中X是任何标点符号,我仍然会得到NULL。那么,如何访问这些列?

由于

1 个答案:

答案 0 :(得分:1)

我们需要使用check.names=FALSE

spamd <- read.table(file, sep = "" , header = F, stringsAsFactors= F,
                col.names = columnNames, check.names = FALSE)