Question

我已经安装了scikit-learn，我不知道如何使用它。我有一些看起来像这样的数据：

{"Tiempo": 2.1,  "Brazos": "der", "Puntuacion ": 112, "Nombre": "Alguien1"},
{"Tiempo": 4.1, "Brazos": "izq", "Puntuacion ": 11, "Nombre": "Alguien2"},
{"Tiempo": 3.211,  "Brazos": "ambos","Puntuacion ": 1442, "Nombre": "Alguien3"}

我想在它们上使用一些分类器（如SVM）。对于我在示例中看到的内容，我需要创建一个数据集。在示例中，它们总是使用一些预定的数据集作为“虹膜”。就我而言，我想我需要使用我的数据创建自己的数据。为了做到这一点，我搜索了一下，发现我应该使用下一个函数来获取我的数据集的“特征”：

measurements = [
    {'city': 'Dubai', 'temperature': 33.},
    {'city': 'London', 'temperature': 12.},
    {'city': 'San Fransisco', 'temperature': 18.},
]

from sklearn.feature_extraction import DictVectorizer
vec = DictVectorizer()

vec.fit_transform(measurements).toarray()
array([[  1.,   0.,   0.,  33.],
       [  0.,   1.,   0.,  12.],
       [  0.,   0.,   1.,  18.]])

>>> vec.get_feature_names()
['city=Dubai', 'city=London', 'city=San Fransisco', 'temperature']

在我的情况下，在使用我的数据函数后，我得到了这个： enter image description here

有了这个，我想我需要获得“样本”，但是，我不知道该怎么做。请问你能帮帮我吗？如果我的假设是正确的，你能告诉我吗？

Answer 1

你走在正确的轨道上。以您的数据为例。

Product::find($id)->user->each(function($user){
  $user->touch();
});

所以你看，from sklearn.feature_extraction import DictVectorizer # your data data = [{"Tiempo": 2.1, "Brazos": "der", "Puntuacion ": 112, "Nombre": "Alguien1"}, {"Tiempo": 4.1, "Brazos": "izq", "Puntuacion ": 11, "Nombre": "Alguien2"}, {"Tiempo": 3.211, "Brazos": "ambos","Puntuacion ": 1442, "Nombre": "Alguien3"}] # make dummy for categorical variables transformer = DictVectorizer() transformer.fit_transform(data).toarray() Out[168]: array([[ 0.0000e+00, 1.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00, 1.1200e+02, 2.1000e+00], [ 0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00, 1.1000e+01, 4.1000e+00], [ 1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00, 1.4420e+03, 3.2110e+00]]) transformer.get_feature_names() Out[170]: ['Brazos=ambos', 'Brazos=der', 'Brazos=izq', 'Nombre=Alguien1', 'Nombre=Alguien2', 'Nombre=Alguien3', 'Puntuacion ', 'Tiempo']中的每条记录都有8列，前3个是Out[168]的分类虚拟（查看Brazos中的要素名称），接下来的三个是假的对于Out[170]，最后两个是继续数值Nombre和Puntuacion（不需要任何转换并保持原样）。

Tiempo

SciKit-learn（python） - 创建我的数据集

1 个答案: