打开文本文件,缺少数据

时间:2016-08-12 00:19:48

标签: r missing-data read.csv

我想打开一个包含40个变量的缺失数据的文本文件到40列的数据框中。但是,当我使用传统的read.csv.时,数据读取不正确,数据框只有38列。我猜测丢失的数据有效果。

这是文本文件的示例:

0   1   1   5   0   1382    4   15  2   181 1   2       2   68fd1e64    80e26c9b    fb936136    7b4723c4    25c83c98    7e0ccccf    de7995b8    1f89b562    a73ee510    a8cd5504    b2cb9c98    37c9c164    2824a5f6    1adce6ef    8ba8b39a    891b62e7    e5ba7672    f54016b9    21ddcdc9    b1252a9d    07b5194c        3a171ecb    c5c50484    e8b83407    9727dd16
0   2   0   44  1   102 8   2   2   4   1   1       4   68fd1e64    f0cf0024    6f67f7e5    41274cd7    25c83c98    fe6b92e5    922afcc0    0b153874    a73ee510    2b53e5fb    4f1b46f3    623049e6    d7020589    b28479f6    e6c5b5cd    c92f3b61    07c540c4    b04e4670    21ddcdc9    5840adea    60f6221e        3a171ecb    43f13e8b    e8b83407    731c3655
0   2   0   1   14  767 89  4   2   245 1   3   3   45  287e684f    0a519c5c    02cf9876    c18be181    25c83c98    7e0ccccf    c78204a1    0b153874    a73ee510    3b08e48b    5f5e6091    8fe001f4    aa655a2f    07d13a8f    6dc710ed    36103458    8efede7f    3412118d            e587c466    ad3062eb    3a171ecb    3b183c5c        
0       893         4392        0   0   0       0           68fd1e64    2c16a946    a9a87e68    2e17d6f6    25c83c98    fe6b92e5    2e8a689b    0b153874    a73ee510    efea433b    e51ddf94    a30567ca    3516f6e6    07d13a8f    18231224    52b8680f    1e88c74f    74ef3502            6b3a5ca6        3a171ecb    9117a34a        
0   3   -1      0   2   0   3   0   0   1   1       0   8cf07265    ae46a29d    c81688bb    f922efad    25c83c98    13718bbd    ad9fa255    0b153874    a73ee510    5282c137    e5d8af57    66a76a26    f06c53ac    1adce6ef    8ff4b403    01adbab4    1e88c74f    26b3c7a7            21c9516a        32c7478e    b34f3128        
0       -1          12824       0   0   6       0           05db9164    6c9c9cf3    2730ec9c    5400db8b    43b19349    6f6d9be8    53b5f978    0b153874    a73ee510    3b08e48b    91e8fc27    be45b877    9ff13f22    07d13a8f    06969a20    9bc7fff5    776ce399    92555263            242bb710    8ec974f4    be7c41b4    72c78f11        
0       1   2       3168        0   1   2       0           439a44a4    ad4527a2    c02372d0    d34ebbaa    43b19349    fe6b92e5    4bc6ffea    0b153874    a73ee510    3b08e48b    a4609aab    14d63538    772a00d7    07d13a8f    f9d1382e    b00d3dc9    776ce399    cdfa8259            20062612        93bad2c0    1b256e61        
1   1   4   2   0   0   0   1   0   0   1   1       0   68fd1e64    2c16a946    503b9dbc    e4dbea90    f3474129    13718bbd    38eb9cf4    1f89b562    a73ee510    547c0ffe    bc8c9f21    60ab2f07    46f42a63    07d13a8f    18231224    e6b6bdc7    e5ba7672    74ef3502            5316a17f        32c7478e    9117a34a        
0       44  4   8   19010   249 28  31  141     1       8   05db9164    d833535f    d032c263    c18be181    25c83c98    7e0ccccf    d5b6acf2    0b153874    a73ee510    2acdcf4e    086ac2d2    dfbb09fb    41a6ae00    b28479f6    e2502ec9    84898b2a    e5ba7672    42a2edb9            0014c32a        32c7478e    3b183c5c        
0       35      1   33737   21  1   2   3       1       1   05db9164    510b40a5    d03e7c24    eb1fd928    25c83c98        52283d1c    0b153874    a73ee510    015ac893    e51ddf94    951fe4a9    3516f6e6    07d13a8f    2ae4121c    8ec71479    d4bb7bd8    70d0f5f9            0e63fca0        32c7478e    0e8fe315        
0       2   632 0   56770       0   5   65      0       2   05db9164    0468d672    7ae80d0f    80d8555a    25c83c98    7e0ccccf    04277bf9    0b153874    7cc72ec2    3b08e48b    7e2c5c15    cfc86806    91a1b611    b28479f6    58251aab    146a70fd    776ce399    0b331314    21ddcdc9    5840adea    cbec39db        3a171ecb    cedad179    ea9a246c    9a556cfc
0   0   6   6   6   421 109 1   7   107 0   1       6   05db9164    9b5fd12f            4cf72387        111121f4    0b153874    a73ee510    3b08e48b    ac9c2e8f        6e2d6a15    07d13a8f    796a1a2e        d4bb7bd8    8aaa5b67                    32c7478e            
1   0   -1          1465    0   17  0   4   0   4           241546e0    38a947a1    fa673455    6a14f9b9    25c83c98    fe6b92e5    1c86e0eb    1f89b562    a73ee510    e7ba2569    755e4a50    208d9687    5978055e    07d13a8f    5182f694    f8b34416    e5ba7672    e5f8f18f            f3ddd519        32c7478e    b34f3128        
1       2   11  5   10262   34  2   4   5       1       5   be589b51    287130e0    cd7a7a22    fb7334df    25c83c98        6cdb3998    361384ce    a73ee510    3ff10fb2    5874c9c9    976cbd4c    740c210d    1adce6ef    310d155b    07eb8110    07c540c4    891589e7    18259a83    a458ea53    a0ab60ca        32c7478e    a052b1ed    9b3e8820    8967c0d2
0   0   51  84  4   3633    26  1   4   8   0   1       4   5a9ed9b0    80e26c9b    97144401    5dbf0cc5    0942e0a7    13718bbd    9ce6136d    0b153874    a73ee510    2106e595    b5bb9d63    04f55317    ab04d8fe    1adce6ef    0ad47a49    2bd32e5c    3486227d    12195b22    21ddcdc9    b1252a9d    fa131867        dbb486d7    8ecc176a    e8b83407    c43c3f58
0       2   1   18  20255       0   1   1306        0       20  05db9164    bc6e3dc1    67799c69    d00d0f35    4cf72387    7e0ccccf    ca4fd8f8    64523cfa    a73ee510    3b08e48b    a0060bca    b9f28c33    22d23aac    5aebfb83    d702713a    0f655650    776ce399    3a2028fd            b426bc93        3a171ecb    2e0a0035        
1   1   987     2   105 2   1   2   2   1   1       2   68fd1e64    38d50e09    da603082    431a5096    43b19349    7e0ccccf    3f35b640    0b153874    a73ee510    3b08e48b    3d5fb018    6aaab577    94172618    07d13a8f    ee569ce2    2f03ef40    d4bb7bd8    582152eb    21ddcdc9    b1252a9d    3b203ca1        32c7478e    b21dc903    001f3601    aa5f0a15
0   0   1       0   16597   557 3   5   123 0   1       1   8cf07265    7cd19acc    77f2f2e5    d16679b9    4cf72387    fbad5c96    8fb24933    0b153874    a73ee510    0095a535    3617b5f5    9f32b866    428332cf    b28479f6    83ebd498    31ca40b6    e5ba7672    d0e5eb07            dfcfc3fa    ad3062eb    32c7478e    aee52b6f        
0   0   24  4   2   2056    12  6   10  83  0   1       2   05db9164    f0cf0024    08b45d8b    cbb5af1b    384874ce    fbad5c96    81bb0302    37e4aa92    a73ee510    175d6c71    b7094596    1c547463    1f9d2c38    1adce6ef    55dc357b    0ca69655    e5ba7672    b04e4670    21ddcdc9    b1252a9d    f3caefdd        32c7478e    4c8e5aef    ea9a246c    9593bba9
0   7   102     3   780 15  7   15  15  1   1       3   3c9d8785    b0660259    3a960356    15c92ddb    4cf72387    13718bbd    00c46cd1    0b153874    a73ee510    62cfc6bd    8cffe207    656e5413    ff5626de    ad1cc976    27b1230c    fa8d05aa    e5ba7672    5edd90de            e12ce348        c3dc6cef    49045073        
1       47      0   6399    38  19  10  143     10      6   1464facd    38a947a1    223b0e16    ca55061c    25c83c98    7e0ccccf    6933dec1    5b392875    a73ee510    3b08e48b    860c302b    156f99ef    30735474    1adce6ef    0e78291e    5fbf4a84    e5ba7672    1999bae9            deb9605d        32c7478e    e448275f        
0   0   1   80  0   1848    287 1   4   46  0   1       4   05db9164    09e68b86    13b87f72    13a91973    25c83c98    7e0ccccf    cc5ed2f1    0b153874    a73ee510    3b08e48b    081c279a    d25f00b6    9f16a973    07d13a8f    36721ddc    1746d357    d4bb7bd8    5aed7436    a153cea2    a458ea53    dd37e0

1 个答案:

答案 0 :(得分:0)

在zwol的帮助下,我使用了这段代码,并提供了额外的功能:

data <- read.table(file = "dac_sample.txt", sep="\t", header=FALSE, 
                   na.strings = c("", " ", "NA"), 
                   col.names = c("Label", "I1", "I2", "I3", "I4", "I5", "I6", "I7", "I8", "I9",
                                 "I10", "I11", "I12", "I13", "C1", "C2", "C3", "C4", "C5", "C6",
                                 "C7", "C8", "C9", "C10", "C11", "C12", "C13", "C14", "C15", 
                                 "C16", "C17", "C18", "C19", "C20", "C21", "C22", "C23", "C24",
                                 "C25", "C26"), 
                   colClasses = c("factor", "numeric", "numeric", "numeric", "numeric", 
                                  "numeric", "numeric", "numeric", "numeric", "numeric", 
                                  "numeric", "numeric", "numeric", "numeric", "factor", 
                                  "factor", "factor", "factor", "factor", "factor", "factor", 
                                  "factor", "factor", "factor", "factor", "factor", "factor", 
                                  "factor", "factor", "factor", "factor", "factor", "factor", 
                                  "factor", "factor", "factor", "factor", "factor", "factor", 
                                  "factor"))

数据框的特征:

> dim(data)
[1] 100000     40
> t(lapply(data, class))
     Label    I1        I2        I3        I4        I5        I6        I7        I8       
[1,] "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
     I9        I10       I11       I12       I13       C1       C2       C3       C4      
[1,] "numeric" "numeric" "numeric" "numeric" "numeric" "factor" "factor" "factor" "factor"
     C5       C6       C7       C8       C9       C10      C11      C12      C13      C14     
[1,] "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor"
     C15      C16      C17      C18      C19      C20      C21      C22      C23      C24     
[1,] "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor"
     C25      C26     
[1,] "factor" "factor"