我有csv文件,它们是1200行x 3列。行数可以从低至500到高达5000不等,但列保持不变。
我想从这些文件中创建一个特征向量,从而保持一致的单元格/向量长度&从而有助于找出这些载体之间的距离。
FILE_1
A, B, C
(267.09669678867186, 6.3664069175720197, 1257325.5809999991),
(368.24070923984374, 9.0808353424072301, 49603.662999999884),
(324.21470826328124, 11.489830970764199, 244391.04699999979),
(514.33452027500005, 7.5162401199340803, 56322.424999999988),
(386.19673340976561, 9.4927110671997106, 175958.77100000033),
(240.09965330898439, 10.3463039398193, 457819.8519411764),
(242.17559998691405, 8.4401674270629901, 144891.51100000029),
(314.23066895664061, 7.4405002593994096, 58433.818999999959),
(933.3073596304688, 7.1564397811889604, 41977.960000000014),
(274.04136473476564, 4.8482465744018599, 48782.314891525479),
(584.2639294320312, 7.90128517150879, 49730.705000000096),
(202.13173096835936, 10.559995651245099, 20847.805144088608),
(324.98563963710939, 2.2546300888061501, 43767.774800000007),
(464.35059935390626, 11.573680877685501, 1701597.3915132943),
(776.28339964687495, 8.7755222320556605, 106882.2469999999),
(310.11652952968751, 10.3175926208496, 710341.19162800116),
(331.19962889492189, 10.7578010559082, 224621.80632433048),
(452.31337752387947, 7.3100395202636701, 820707.26700000139),
(430.16615111171876, 10.134071350097701, 18197.691999999963),
(498.24687010585939, 11.0102319717407, 45423.269964585743),
.....,
.....,
500th row
FILE_2
(363.02781861484374, 8.8369808197021502, 72898.479666666608),
(644.20353882968755, 8.6263589859008807, 22776.78799999999),
(259.25105469882811, 9.8575859069824201, 499615.64068339905),
(410.19474608242189, 9.8795070648193395, 316146.18800000293),
(288.12153809726561, 4.7451887130737296, 58615.577999999943),
(376.25868409335936, 10.508985519409199, 196522.12200000012),
(261.11118895351564, 8.5228433609008807, 32721.110000000026),
(319.98896605312501, 3.2100667953491202, 60587.077000000027),
(286.94926268398439, 4.7687568664550799, 47842.133999999867),
(121.00206177890625, 7.9372291564941397, 239813.20531182736),
(308.19895750820314, 6.0029039382934597, 26354.519000000011),
(677.17011839687495, 9.0299625396728498, 10391.757655172449),
(182.1304913216797, 8.0010566711425799, 145583.55700000061),
(187.06341736972655, 9.9460496902465803, 77488.229000000007),
(144.07867615878905, 3.6044106483459499, 104651.56499999999),
(288.92317015468751, 4.3750333786010698, 151872.1949999998),
(228.2089825326172, 4.4475774765014604, 658120.07628214348),
(496.18831055820311, 11.422966003418001, 2371155.6659999997),
(467.30134398281251, 11.0771179199219, 109702.48440899582),
(163.08418089687501, 5.7271881103515598, 38107.106791666629),
.....,
.....,
3400th row
您可以看到两个文件之间没有对应关系,即如果有人要求您计算这两个矢量之间的距离,则无法实现。
目的是能够以这种方式插入两个文件的行,以便在所有这些文件之间保持一致。即当我抬头看第一排, 它应该代表所有文件的相同功能。现在看看FILE_1
Range of values for three columns is (considering only 20 rows for time being)
A: 202.13173096835936,933.3073596304688
B: 2.2546300888061501, 11.573680877685501
C: 18197.691999999963,1701597.3915132943
我想将这些点放在一个3d数组上,其网格大小为.1X.1X.1(或者说10X10X10或任意大小的网格单元格) 但为了实现这一目标,我们需要规范化数据(均值标准化等)
现在我们拥有的数据是一个3d数据,需要对其进行规范化,以便将它们插入到这个3d数组中。即使它是一个矢量,它也不需要3d。
现在,当我说我需要对点进行平均时,我的意思是,如果在一个单元格中有两个以上的点发生下降(如果单元格大小很大,例如100X100X100则会发生),那么我们将取平均值x,y,z坐标作为该单元格的值。
这些插值向量将具有相同的长度和宽度。对应,因为当与其他此类向量相比时,向量的对应点将表示相同的点。
**注:Min&所有文件中所有坐标的最大范围是100:1000,2:12,10000:2000000