如何创建scikit学习数据集?

时间:2016-01-22 11:53:35

标签: csv machine-learning dataset scikit-learn

我有一个数组,其中第一列是类(整数形式),其余列是功能。

SG喜欢这个

    2016-01-22 12:47:43.0520 Debug ScanAssembly('NLog, Version=4.0.0.0, Culture=neutral, PublicKeyToken=5120e14c03d0593c')
    ....
    ....

    2016-01-22 12:47:43.2532 Trace    Scanning LongDateLayoutRenderer 'Layout Renderer: ${longdate}'
    2016-01-22 12:47:43.2532 Trace    Scanning LiteralLayoutRenderer 'Layout Renderer: ${literal}'
    2016-01-22 12:47:43.2532 Trace    Scanning LevelLayoutRenderer 'Layout Renderer: ${level}'
    2016-01-22 12:47:43.2532 Trace    Scanning LiteralLayoutRenderer 'Layout Renderer: ${literal}'
    2016-01-22 12:47:43.2532 Trace    Scanning MessageLayoutRenderer 'Layout Renderer: ${message}'
    2016-01-22 12:47:43.2532 Info Found 105 configuration items

如何将其转换为scikit兼容数据集,因此我可以调用sg mydataset = datasets.load_mydataset()?

1 个答案:

答案 0 :(得分:4)

你可以简单地使用熊猫。例如如果已将数据集复制到dataset.csv文件。只需在csv文件中正确标记列。

In [1]: import pandas as pd

In [2]: df = pd.read_csv('temp.csv')

In [3]: df
Out[3]: 
   Label  f1  f2  f3  f4
0      1   0  34  23   2
1      0   0  21  11   0
2      3  11   2  11   1

In [4]: y_train= df['Label']

In [5]: x_train = df.drop('Label', axis=1)

In [6]: x_train
Out[6]: 
   f1  f2  f3  f4
0   0  34  23   2
1   0  21  11   0
2  11   2  11   1

In [7]: y_train
Out[7]: 
0    1
1    0
2    3