我在阅读xls时遇到了很大的问题。归档到我的机器学习项目。我需要提取的数据保存在.xls文件中,我找不到任何容易提取到tensorflow数据集模型的选项,任何人都可以帮忙吗?
链接到这些数据: “http://archive.ics.uci.edu/ml/machine-learning-databases/00192/BreastTissue.xls”
答案 0 :(得分:1)
尝试使用Pandas模块:
import pandas as pd
In [24]: df = pd.read_excel(r'D:\download\BreastTissue.xls', sheet_name='Data')
In [25]: df
Out[25]:
Case # Class I0 PA500 HFS DA Area A/DA Max IP DR P
0 1 car 524.794072 0.187448 0.032114 228.800228 6843.598481 29.910803 60.204880 220.737212 556.828334
1 2 car 330.000000 0.226893 0.265290 121.154201 3163.239472 26.109202 69.717361 99.084964 400.225776
2 3 car 551.879287 0.232478 0.063530 264.804935 11888.391827 44.894903 77.793297 253.785300 656.769449
3 4 car 380.000000 0.240855 0.286234 137.640111 5402.171180 39.248524 88.758446 105.198568 493.701814
4 5 car 362.831266 0.200713 0.244346 124.912559 3290.462446 26.342127 69.389389 103.866552 424.796503
5 6 car 389.872978 0.150098 0.097738 118.625814 2475.557078 20.868620 49.757149 107.686164 429.385788
6 7 car 290.455141 0.144164 0.053058 74.635067 1189.545213 15.938154 35.703331 65.541324 330.267293
7 8 car 275.677393 0.153938 0.187797 91.527893 1756.234837 19.187974 39.305183 82.658682 331.588302
8 9 car 470.000000 0.213105 0.225497 184.590057 8185.360837 44.343455 84.482483 164.122511 603.315715
9 10 car 423.000000 0.219562 0.261799 172.371241 6108.106297 35.435762 79.056351 153.172903 558.274515
.. ... ... ... ... ... ... ... ... ... ... ...
96 97 adi 1650.000000 0.047647 0.043284 274.426177 5824.895192 21.225727 81.239571 262.125656 1603.070348
97 98 adi 2800.000000 0.083078 0.184307 583.259257 31388.652882 53.815953 298.582977 501.038494 2896.582483
98 99 adi 2329.840138 0.066148 0.353255 377.253368 25369.039925 67.246689 336.075165 171.387227 2686.435346
99 100 adi 2400.000000 0.084125 0.220610 596.041956 37939.255571 63.651988 261.348175 535.689409 2447.772353
100 101 adi 2000.000000 0.067195 0.124267 330.271646 15381.097687 46.571051 169.197983 283.639564 2063.073212
101 102 adi 2000.000000 0.106989 0.105418 520.222649 40087.920984 77.059161 204.090347 478.517223 2088.648870
102 103 adi 2600.000000 0.200538 0.208043 1063.441427 174480.476218 164.071543 418.687286 977.552367 2664.583623
103 104 adi 1600.000000 0.071908 -0.066323 436.943603 12655.342135 28.963331 103.732704 432.129749 1475.371534
104 105 adi 2300.000000 0.045029 0.136834 185.446044 5086.292497 27.427344 178.691742 49.593290 2480.592151
105 106 adi 2600.000000 0.069988 0.048869 745.474369 39845.773698 53.450226 154.122604 729.368395 2545.419744
[106 rows x 11 columns]
In [26]: df.dtypes
Out[26]:
Case # int64
Class object
I0 float64
PA500 float64
HFS float64
DA float64
Area float64
A/DA float64
Max IP float64
DR float64
P float64
dtype: object
In [27]: df.shape
Out[27]: (106, 11)