我想打开一个包含40个变量的缺失数据的文本文件到40列的数据框中。但是,当我使用传统的read.csv.
时,数据读取不正确,数据框只有38列。我猜测丢失的数据有效果。
这是文本文件的示例:
0 1 1 5 0 1382 4 15 2 181 1 2 2 68fd1e64 80e26c9b fb936136 7b4723c4 25c83c98 7e0ccccf de7995b8 1f89b562 a73ee510 a8cd5504 b2cb9c98 37c9c164 2824a5f6 1adce6ef 8ba8b39a 891b62e7 e5ba7672 f54016b9 21ddcdc9 b1252a9d 07b5194c 3a171ecb c5c50484 e8b83407 9727dd16
0 2 0 44 1 102 8 2 2 4 1 1 4 68fd1e64 f0cf0024 6f67f7e5 41274cd7 25c83c98 fe6b92e5 922afcc0 0b153874 a73ee510 2b53e5fb 4f1b46f3 623049e6 d7020589 b28479f6 e6c5b5cd c92f3b61 07c540c4 b04e4670 21ddcdc9 5840adea 60f6221e 3a171ecb 43f13e8b e8b83407 731c3655
0 2 0 1 14 767 89 4 2 245 1 3 3 45 287e684f 0a519c5c 02cf9876 c18be181 25c83c98 7e0ccccf c78204a1 0b153874 a73ee510 3b08e48b 5f5e6091 8fe001f4 aa655a2f 07d13a8f 6dc710ed 36103458 8efede7f 3412118d e587c466 ad3062eb 3a171ecb 3b183c5c
0 893 4392 0 0 0 0 68fd1e64 2c16a946 a9a87e68 2e17d6f6 25c83c98 fe6b92e5 2e8a689b 0b153874 a73ee510 efea433b e51ddf94 a30567ca 3516f6e6 07d13a8f 18231224 52b8680f 1e88c74f 74ef3502 6b3a5ca6 3a171ecb 9117a34a
0 3 -1 0 2 0 3 0 0 1 1 0 8cf07265 ae46a29d c81688bb f922efad 25c83c98 13718bbd ad9fa255 0b153874 a73ee510 5282c137 e5d8af57 66a76a26 f06c53ac 1adce6ef 8ff4b403 01adbab4 1e88c74f 26b3c7a7 21c9516a 32c7478e b34f3128
0 -1 12824 0 0 6 0 05db9164 6c9c9cf3 2730ec9c 5400db8b 43b19349 6f6d9be8 53b5f978 0b153874 a73ee510 3b08e48b 91e8fc27 be45b877 9ff13f22 07d13a8f 06969a20 9bc7fff5 776ce399 92555263 242bb710 8ec974f4 be7c41b4 72c78f11
0 1 2 3168 0 1 2 0 439a44a4 ad4527a2 c02372d0 d34ebbaa 43b19349 fe6b92e5 4bc6ffea 0b153874 a73ee510 3b08e48b a4609aab 14d63538 772a00d7 07d13a8f f9d1382e b00d3dc9 776ce399 cdfa8259 20062612 93bad2c0 1b256e61
1 1 4 2 0 0 0 1 0 0 1 1 0 68fd1e64 2c16a946 503b9dbc e4dbea90 f3474129 13718bbd 38eb9cf4 1f89b562 a73ee510 547c0ffe bc8c9f21 60ab2f07 46f42a63 07d13a8f 18231224 e6b6bdc7 e5ba7672 74ef3502 5316a17f 32c7478e 9117a34a
0 44 4 8 19010 249 28 31 141 1 8 05db9164 d833535f d032c263 c18be181 25c83c98 7e0ccccf d5b6acf2 0b153874 a73ee510 2acdcf4e 086ac2d2 dfbb09fb 41a6ae00 b28479f6 e2502ec9 84898b2a e5ba7672 42a2edb9 0014c32a 32c7478e 3b183c5c
0 35 1 33737 21 1 2 3 1 1 05db9164 510b40a5 d03e7c24 eb1fd928 25c83c98 52283d1c 0b153874 a73ee510 015ac893 e51ddf94 951fe4a9 3516f6e6 07d13a8f 2ae4121c 8ec71479 d4bb7bd8 70d0f5f9 0e63fca0 32c7478e 0e8fe315
0 2 632 0 56770 0 5 65 0 2 05db9164 0468d672 7ae80d0f 80d8555a 25c83c98 7e0ccccf 04277bf9 0b153874 7cc72ec2 3b08e48b 7e2c5c15 cfc86806 91a1b611 b28479f6 58251aab 146a70fd 776ce399 0b331314 21ddcdc9 5840adea cbec39db 3a171ecb cedad179 ea9a246c 9a556cfc
0 0 6 6 6 421 109 1 7 107 0 1 6 05db9164 9b5fd12f 4cf72387 111121f4 0b153874 a73ee510 3b08e48b ac9c2e8f 6e2d6a15 07d13a8f 796a1a2e d4bb7bd8 8aaa5b67 32c7478e
1 0 -1 1465 0 17 0 4 0 4 241546e0 38a947a1 fa673455 6a14f9b9 25c83c98 fe6b92e5 1c86e0eb 1f89b562 a73ee510 e7ba2569 755e4a50 208d9687 5978055e 07d13a8f 5182f694 f8b34416 e5ba7672 e5f8f18f f3ddd519 32c7478e b34f3128
1 2 11 5 10262 34 2 4 5 1 5 be589b51 287130e0 cd7a7a22 fb7334df 25c83c98 6cdb3998 361384ce a73ee510 3ff10fb2 5874c9c9 976cbd4c 740c210d 1adce6ef 310d155b 07eb8110 07c540c4 891589e7 18259a83 a458ea53 a0ab60ca 32c7478e a052b1ed 9b3e8820 8967c0d2
0 0 51 84 4 3633 26 1 4 8 0 1 4 5a9ed9b0 80e26c9b 97144401 5dbf0cc5 0942e0a7 13718bbd 9ce6136d 0b153874 a73ee510 2106e595 b5bb9d63 04f55317 ab04d8fe 1adce6ef 0ad47a49 2bd32e5c 3486227d 12195b22 21ddcdc9 b1252a9d fa131867 dbb486d7 8ecc176a e8b83407 c43c3f58
0 2 1 18 20255 0 1 1306 0 20 05db9164 bc6e3dc1 67799c69 d00d0f35 4cf72387 7e0ccccf ca4fd8f8 64523cfa a73ee510 3b08e48b a0060bca b9f28c33 22d23aac 5aebfb83 d702713a 0f655650 776ce399 3a2028fd b426bc93 3a171ecb 2e0a0035
1 1 987 2 105 2 1 2 2 1 1 2 68fd1e64 38d50e09 da603082 431a5096 43b19349 7e0ccccf 3f35b640 0b153874 a73ee510 3b08e48b 3d5fb018 6aaab577 94172618 07d13a8f ee569ce2 2f03ef40 d4bb7bd8 582152eb 21ddcdc9 b1252a9d 3b203ca1 32c7478e b21dc903 001f3601 aa5f0a15
0 0 1 0 16597 557 3 5 123 0 1 1 8cf07265 7cd19acc 77f2f2e5 d16679b9 4cf72387 fbad5c96 8fb24933 0b153874 a73ee510 0095a535 3617b5f5 9f32b866 428332cf b28479f6 83ebd498 31ca40b6 e5ba7672 d0e5eb07 dfcfc3fa ad3062eb 32c7478e aee52b6f
0 0 24 4 2 2056 12 6 10 83 0 1 2 05db9164 f0cf0024 08b45d8b cbb5af1b 384874ce fbad5c96 81bb0302 37e4aa92 a73ee510 175d6c71 b7094596 1c547463 1f9d2c38 1adce6ef 55dc357b 0ca69655 e5ba7672 b04e4670 21ddcdc9 b1252a9d f3caefdd 32c7478e 4c8e5aef ea9a246c 9593bba9
0 7 102 3 780 15 7 15 15 1 1 3 3c9d8785 b0660259 3a960356 15c92ddb 4cf72387 13718bbd 00c46cd1 0b153874 a73ee510 62cfc6bd 8cffe207 656e5413 ff5626de ad1cc976 27b1230c fa8d05aa e5ba7672 5edd90de e12ce348 c3dc6cef 49045073
1 47 0 6399 38 19 10 143 10 6 1464facd 38a947a1 223b0e16 ca55061c 25c83c98 7e0ccccf 6933dec1 5b392875 a73ee510 3b08e48b 860c302b 156f99ef 30735474 1adce6ef 0e78291e 5fbf4a84 e5ba7672 1999bae9 deb9605d 32c7478e e448275f
0 0 1 80 0 1848 287 1 4 46 0 1 4 05db9164 09e68b86 13b87f72 13a91973 25c83c98 7e0ccccf cc5ed2f1 0b153874 a73ee510 3b08e48b 081c279a d25f00b6 9f16a973 07d13a8f 36721ddc 1746d357 d4bb7bd8 5aed7436 a153cea2 a458ea53 dd37e0
答案 0 :(得分:0)
在zwol的帮助下,我使用了这段代码,并提供了额外的功能:
data <- read.table(file = "dac_sample.txt", sep="\t", header=FALSE,
na.strings = c("", " ", "NA"),
col.names = c("Label", "I1", "I2", "I3", "I4", "I5", "I6", "I7", "I8", "I9",
"I10", "I11", "I12", "I13", "C1", "C2", "C3", "C4", "C5", "C6",
"C7", "C8", "C9", "C10", "C11", "C12", "C13", "C14", "C15",
"C16", "C17", "C18", "C19", "C20", "C21", "C22", "C23", "C24",
"C25", "C26"),
colClasses = c("factor", "numeric", "numeric", "numeric", "numeric",
"numeric", "numeric", "numeric", "numeric", "numeric",
"numeric", "numeric", "numeric", "numeric", "factor",
"factor", "factor", "factor", "factor", "factor", "factor",
"factor", "factor", "factor", "factor", "factor", "factor",
"factor", "factor", "factor", "factor", "factor", "factor",
"factor", "factor", "factor", "factor", "factor", "factor",
"factor"))
数据框的特征:
> dim(data)
[1] 100000 40
> t(lapply(data, class))
Label I1 I2 I3 I4 I5 I6 I7 I8
[1,] "factor" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
I9 I10 I11 I12 I13 C1 C2 C3 C4
[1,] "numeric" "numeric" "numeric" "numeric" "numeric" "factor" "factor" "factor" "factor"
C5 C6 C7 C8 C9 C10 C11 C12 C13 C14
[1,] "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor"
C15 C16 C17 C18 C19 C20 C21 C22 C23 C24
[1,] "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor"
C25 C26
[1,] "factor" "factor"