R:读入.csv文件并转换为多列数据框

时间:2017-12-13 21:49:52

标签: r csv

我是R的新用户,目前在阅读.csv文件并将其转换为data.frame 7列时遇到了很多麻烦。这就是我在做的事情:

gene_symbols_table <- as.data.frame(read.csv(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE, sep=","))

之后我收到了data.frame dim = 46761 x 1,但我需要它46761 x 7。我尝试了以下stackoverflow个帖子:

  1. How can you read a CSV file in R with different number of columns

  2. read.delim() - errors "more columns than column names" and "header and ''col.names" are of different lengths"

  3. Split a column of a data frame to multiple columns

  4. 但不知怎的,我的情况没有任何效果。 表格如下:

    > head(gene_symbols_table, 3)
    input.reason.matches.organism.name.primaryIdentifier.symbol.briefDescription.c
    lass.secondaryIdentifier
    1                     WBGene00008675 MATCH 1 Caenorhabditis elegans    
    WBGene00008675 irld-26  Gene F11A5.7
    2                      WBGene00008676 MATCH 1 Caenorhabditis elegans 
    WBGene00008676 oac-15  Gene F11A5.8
    3                            WBGene00008677 MATCH 1 Caenorhabditis elegans 
    WBGene00008677   Gene F11A5.9
    

    .csv中的Excel文件如下所示:

    input   |  reason   |  matches  |   organism.name  |    primaryIdentifier   |  symbol   | 
    briefDescription
    WBGene00008675  |   MATCH  |    1     |   Caenorhabditis elegans    WBGene00008675  |   irld-26   |   ...   
    ...
    

    以下代码:

    gene_symbols_table <- read.table(file="/home/nikita/Desktop
    /CElegans_raw_data/gene_symbols_matching.csv", header=FALSE, sep=",", 
    col.names = paste0("V",seq_len(7)), fill = TRUE)
    

    似乎正在发挥作用,但是当我调查dim时,我立刻就能看出它是错误的:20124 x 7。然后:

    V1
    1input;reason;matches;organism.name;primaryIdentifier;symbol;briefDescription;class;secondaryIdentifier
    2                     WBGene00008675;MATCH;1;Caenorhabditis 
    elegans;WBGene00008675;irld-26;;Gene;F11A5.7
    3                      WBGene00008676;MATCH;1;Caenorhabditis 
    elegans;WBGene00008676;oac-15;;Gene;F11A5.8
      V2 V3 V4 V5
    1            
    2            
    3        
    

    1

    所以,这是错误的

    read.table的其他尝试正在给我第二个stackoverflow线程中指定的错误。

    我也尝试将data.frame与一列分成7,但到目前为止还没有成功。

1 个答案:

答案 0 :(得分:0)

sep似乎是空格或分号,而不是表格所示的逗号。因此,请尝试指定,或者您可以从fread包中尝试data.table,这会自动检测分隔符。

gene_symbols_table <- as.data.frame(fread(file="/home/nikita/Desktop
/CElegans_raw_data/gene_symbols_matching.csv", header=TRUE))