我有一个数组,其中第一列是类(整数形式),其余列是功能。
SG喜欢这个
2016-01-22 12:47:43.0520 Debug ScanAssembly('NLog, Version=4.0.0.0, Culture=neutral, PublicKeyToken=5120e14c03d0593c')
....
....
2016-01-22 12:47:43.2532 Trace Scanning LongDateLayoutRenderer 'Layout Renderer: ${longdate}'
2016-01-22 12:47:43.2532 Trace Scanning LiteralLayoutRenderer 'Layout Renderer: ${literal}'
2016-01-22 12:47:43.2532 Trace Scanning LevelLayoutRenderer 'Layout Renderer: ${level}'
2016-01-22 12:47:43.2532 Trace Scanning LiteralLayoutRenderer 'Layout Renderer: ${literal}'
2016-01-22 12:47:43.2532 Trace Scanning MessageLayoutRenderer 'Layout Renderer: ${message}'
2016-01-22 12:47:43.2532 Info Found 105 configuration items
如何将其转换为scikit兼容数据集,因此我可以调用sg mydataset = datasets.load_mydataset()?
答案 0 :(得分:4)
你可以简单地使用熊猫。例如如果已将数据集复制到dataset.csv文件。只需在csv文件中正确标记列。
In [1]: import pandas as pd
In [2]: df = pd.read_csv('temp.csv')
In [3]: df
Out[3]:
Label f1 f2 f3 f4
0 1 0 34 23 2
1 0 0 21 11 0
2 3 11 2 11 1
In [4]: y_train= df['Label']
In [5]: x_train = df.drop('Label', axis=1)
In [6]: x_train
Out[6]:
f1 f2 f3 f4
0 0 34 23 2
1 0 21 11 0
2 11 2 11 1
In [7]: y_train
Out[7]:
0 1
1 0
2 3