我有这种数据集:
43907 120 101
11,31,65,67 0:0.380880 1:0.494080 2:0.540010 3:0.422930 4:0.158320 5:0.326980 6:0.390860 7:0.527120 8:0.254050 9:0.223730 10:0.040290 11:0.141130 12:0.112250 13 :0.263170 14:0.147020 15:0.472410 16:0.592610 17:0.653140 18:0.499870 19:0.196520 20:0.403890 21:0.482400 22:0.619220 23:0.320350 24:0.281250 25:0.054750 26:0.180460 27:0.139960 28:0.319930 29:0.181220 30:0.364290 31:0.407210 32:0.368930 33:0.427660 34:0.211390 35:0.364340 36:0.370710 37:0.409110 38:0.289300 39:0.243050 40:0.063120 41:0.193590 42:0.158760 43:0.316050 44:0.197410 45:0.656170 46: 0.678760 47:0.650830 48:0.674640 49:0.492430 50:0.623890 51:0.610620 52:0.678220 53:0.574770 54:0.523070 55:0.206800 56:0.496290 57:0.429220 58:0.586610 59:0.471550 60:0.284480 61:0.432470 62:0.498070 63 :0.408140 64:0.102710 65:0.303030 66:0.309500 67:0.444860 68:0.191730 69:0.174890 70:0.034140 71:0.153100 72:0.068320 73:0.217020 74:0.099690 75:0.409860 76:0.561920 77:0.612030 78:0.514470 79:0.146020 80:0.398810 81:0.383290 82:0.548490 83:0.282940 84:0.252710 85:0.051010 86:0.223110 87:0.098110 88:0.299670 89:0.144870 90:0.308490 91:0.358480 92:0.352080 93:0.394690 94:0.157510 95:0.339370 96:0.321560 97:0.341370 98:0.247970 99: 0.206070 100:0.061000 101:0.216790 102:0.112390 103:0.273650 104:0.152740 105:0.598080 106:0.621690 107:0.607210 108:0.644020 109:0.394950 110:0.593650 111:0.551530 112:0.574390 113:0.511030 114:0.464000 115:0.202030 116 :0.492340 117:0.317980 118:0.547810 119:0.393780
31,33,67 0:0.449570 1:0.460490 2:0.453470 3:0.410780 4:0.231760 5:0.402150 6:0.349590 7:0.536460 8:0.318120 9:0.301620 10:0.063840 11:0.220340 12:0.184360 13:0.309230 14:0.216980 15:0.513320 16:0.517750 17:0.529540 18:0.479400 19:0.268830 20:0.464330 21:0.411790 22:0.633740 23:0.362320 24:0.354890 25:0.078480 26:0.260790 27:0.220420 28:0.356290 29:0.253430 30: 0.399230 31:0.371270 32:0.337540 33:0.399480 34:0.272790 35:0.414420 36:0.335390 37:0.414630 38:0.328620 39:0.296320 40:0.088510 41:0.264240 42:0.221650 43:0.350630 44:0.256610 45:0.662580 46:0.592860 47 :0.565150 48:0.626380 49:0.560600 50:0.669770 51:0.567070 52:0.673730 53:0.566180 54:0.560820 55:0.300700 56:0.564590 57:0.507360 58:0.618470 59:0.521170 60:0.357100 61:0.435480 62:0.505530 63:0.444140 64:0.147280 65:0.368310 66:0.305340 67:0.501230 68:0.241660 69:0.233360 70:0.049390 71:0.215940 72:0.103650 73:0.271220 74:0.146740 75:0.416700 76:0.496200 77:0.586400 78:0.504660 79:0.178360 80: 0.425060 81:0.366600 82:0.568510 83 :0.284050 84:0.282370 85:0.063300 86:0.260140 87:0.127270 88:0.319830 89:0.179630 90:0.349800 91:0.351150 92:0.358620 93:0.409720 94:0.196110 95:0.380290 96:0.313520 97:0.378220 98:0.275040 99:0.248510 100:0.076540 101:0.266020 102:0.145370 103:0.311140 104:0.192090 105:0.618950 106:0.597790 107:0.601750 108:0.646850 109:0.414880 110:0.627460 111:0.539560 112:0.638610 113:0.496370 114:0.480990 115:0.199590 116: 0.535080 117:0.323830 118:0.571490 119:0.397560
第一行表示共有43907行数据,包含101个可能的类和120个维度。我怎样才能在python中读取这种数据集 trainX = [] trainY = []
trainX [0]
预期产量:0.494080,0.540010,0.422930 ..........,0.393780
trainY [0]
预期产量: 11,31,65,67
非常感谢
答案 0 :(得分:1)
with open('file.txt') as file:
head = file.readline()
n, classes, dim = map(int, head.split())
print(n, classes, dim)
train_y = []
train_x = []
for line in file:
line = line.strip()
if line:
data = line.split()
labels = data[0]
print('labels:', labels)
train_y.append(labels)
data = data[1:]
data = [el.split(':')[1] for el in data] # remove index
data = [float(el) for el in data] # convert to float
print('data', len(data), data)
train_x.append(data)
输出:
43907 120 101
11,31,65,67
120 [0.38088, 0.49408, 0.54001, 0.42293, 0.15832, 0.32698, 0.39086, 0.52712, 0.25405, 0.22373, 0.04029, 0.14113, 0.11225, 0.26317, 0.14702, 0.47241, 0.59261, 0.65314, 0.49987, 0.19652, 0.40389, 0.4824, 0.61922, 0.32035, 0.28125, 0.05475, 0.18046, 0.13996, 0.31993, 0.18122, 0.36429, 0.40721, 0.36893, 0.42766, 0.21139, 0.36434, 0.37071, 0.40911, 0.2893, 0.24305, 0.06312, 0.19359, 0.15876, 0.31605, 0.19741, 0.65617, 0.67876, 0.65083, 0.67464, 0.49243, 0.62389, 0.61062, 0.67822, 0.57477, 0.52307, 0.2068, 0.49629, 0.42922, 0.58661, 0.47155, 0.28448, 0.43247, 0.49807, 0.40814, 0.10271, 0.30303, 0.3095, 0.44486, 0.19173, 0.17489, 0.03414, 0.1531, 0.06832, 0.21702, 0.09969, 0.40986, 0.56192, 0.61203, 0.51447, 0.14602, 0.39881, 0.38329, 0.54849, 0.28294, 0.25271, 0.05101, 0.22311, 0.09811, 0.29967, 0.14487, 0.30849, 0.35848, 0.35208, 0.39469, 0.15751, 0.33937, 0.32156, 0.34137, 0.24797, 0.20607, 0.061, 0.21679, 0.11239, 0.27365, 0.15274, 0.59808, 0.62169, 0.60721, 0.64402, 0.39495, 0.59365, 0.55153, 0.57439, 0.51103, 0.464, 0.20203, 0.49234, 0.31798, 0.54781, 0.39378]
31,33,67
120 [0.44957, 0.46049, 0.45347, 0.41078, 0.23176, 0.40215, 0.34959, 0.53646, 0.31812, 0.30162, 0.06384, 0.22034, 0.18436, 0.30923, 0.21698, 0.51332, 0.51775, 0.52954, 0.4794, 0.26883, 0.46433, 0.41179, 0.63374, 0.36232, 0.35489, 0.07848, 0.26079, 0.22042, 0.35629, 0.25343, 0.39923, 0.37127, 0.33754, 0.39948, 0.27279, 0.41442, 0.33539, 0.41463, 0.32862, 0.29632, 0.08851, 0.26424, 0.22165, 0.35063, 0.25661, 0.66258, 0.59286, 0.56515, 0.62638, 0.5606, 0.66977, 0.56707, 0.67373, 0.56618, 0.56082, 0.3007, 0.56459, 0.50736, 0.61847, 0.52117, 0.3571, 0.43548, 0.50553, 0.44414, 0.14728, 0.36831, 0.30534, 0.50123, 0.24166, 0.23336, 0.04939, 0.21594, 0.10365, 0.27122, 0.14674, 0.4167, 0.4962, 0.5864, 0.50466, 0.17836, 0.42506, 0.3666, 0.56851, 0.28405, 0.28237, 0.0633, 0.26014, 0.12727, 0.31983, 0.17963, 0.3498, 0.35115, 0.35862, 0.40972, 0.19611, 0.38029, 0.31352, 0.37822, 0.27504, 0.24851, 0.07654, 0.26602, 0.14537, 0.31114, 0.19209, 0.61895, 0.59779, 0.60175, 0.64685, 0.41488, 0.62746, 0.53956, 0.63861, 0.49637, 0.48099, 0.19959, 0.53508, 0.32383, 0.57149, 0.39756]