我在python中有一些格式如下的列表:
features = [(array([ 2, 5, 7, 15, 15, 14, 1, 1, 0, 4, 4, 3, 6,
10, 11, 12, 13, 9, 8, 18, 17, 17, 18, 16, 16, 17,
21, 20, 19, 25, 24, 24, 23, 23, 23, 22, 29, 29, 30,
31, 28, 27, 33, 33, 33, 35, 39, 39, 39, 42, 41, 44,
43, 26, 32, 32, 33, 34, 37, 37, 36, 37, 37, 37, 38,
39, 39, 40, 42, 42, 50, 49, 48, 46, 45, 51, 52, 59,
57, 56, 47, 58, 54, 55, 53, 52, 60, 61, 62, 63, 64,
64, 70, 70, 69, 64, 64, 64, 65, 71, 71, 65, 65, 65,
68, 67, 66, 66, 70, 71, 71, 72, 73, 74, 75, 73, 78,
76, 77, 77, 81, 81, 83, 82, 81, 78, 80, 79, 84, 85,
86, 84, 88, 87, 88, 91, 87, 93, 93, 92, 92, 88, 90,
89, 95, 94, 98, 99, 99, 95, 95, 97, 96, 102, 101, 101,
100, 106, 106, 107, 106, 105, 102, 102, 104, 103, 103, 118, 118,
122, 110, 113, 113, 119, 122, 109, 114, 117, 120, 123, 108, 108,
115, 115, 116, 116, 121, 121, 124, 124, 111, 112, 112, 125,128,]),... ]))]
len(features) = 24073
len(features[n]) = 5
len(features[0][0]) = 397
len(features[1][0]) = 171
labels = [[0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,,0,0,0,0,0,0,0,...]]
len(labels) = 70871704
len(labels[0] = 397
len(labels[1] = 315
如何将它们同质化以使其具有相同的长度,并且它们的嵌套列表也具有相同的长度?它们来自将正则表达式应用于某些OCR数据,获得了从图像获取文本的框的坐标。
所有功能均已编码,标签均采用二进制格式,其中0为否,1为是。
我需要这些来解决朴素贝叶斯分类问题。
非常感谢,如果我要求太多,很抱歉:(