我正在使用Php-ai/Php-ml框架。在example they give中,AI仅使用了一个无用的功能,但在git主页上,他们还给出了使用多个功能的示例:
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
$labels = ['a', 'a', 'a', 'b', 'b', 'b'];
$classifier = new KNearestNeighbors();
$classifier->train($samples, $labels);
echo $classifier->predict([3, 2]);
基于该示例仅提供一个功能,而第二示例提供了两个功能。我试图重新创建这个:
我正在尝试使用两个功能来重新创建它。我当前的代码段如下所示:
public function train(Request $request) {
# CSV File
$file = $request->file('dataframe');
# Features + 1 will be the labels column
$dataset = new CsvDataset($file, (int) $request->features);
$vectorizer = new TokenCountVectorizer(new WordTokenizer());
$tfIdfTransformer = new TfIdfTransformer();
$finalSamples = [];
for($i = 0; $i <= $request->features -1; $i++):
$samples = [];
foreach ($dataset->getSamples() as $sample)
$samples[] = $sample[$i];
$vectorizer->fit($samples);
$vectorizer->transform($samples);
$tfIdfTransformer->fit($samples);
$tfIdfTransformer->transform($samples);
$finalSamples[] = $samples;
endfor;
# This gives us an output of Array[ 0 => [Feature 1, Feature 2], 1 => [Feature 1, Feature 2], ... ] like shown on example two.
$result = [];
foreach($finalSamples as $arr)
foreach($arr as $k => $v)
$result[$k][] = $v;
$dataset = new ArrayDataset($result, $dataset->getTargets());
$randomSplit = new StratifiedRandomSplit($dataset, 0.1);
$classifier = new SVC(Kernel::RBF, 10000);
# Train with half of the data frame
$classifier->train($randomSplit->getTrainSamples(), $randomSplit->getTrainLabels());
$predictedLabels = $classifier->predict($randomSplit->getTestSamples());
$inputLabels = $randomSplit->getTestLabels();
}
我的CSV文件如下:
"SibSp","Parch","Survived",
"1", "1", "1",
"3", "3", "1",
"4", "1", "0"
"4", "0", "1",
"5", "2", "0"
"3", "1", "0",
"2", "2", "1",
"0", "0", "1"
现在的问题是当我可视化数据时,我喜欢这样做:
$newDataFrame = [];
$incorrect = 0;
for($i = 0; $i <= count($inputLabels) -1; $i++):
$newDataFrame[] = (object) ['input' => $inputLabels[$i], 'output' => $predictedLabels[$i]];
if($inputLabels[$i] != $predictedLabels[$i]) $incorrect++;
ndfor;
$correct = count($inputLabels) - $incorrect;
$score = round((float)Accuracy::score(isset($request->train) ? $randomSplit->getTestLabels() : $inputLabels, $predictedLabels) * 100 );
数据总是以1个正确,1个不正确的形式出现,得分为50(%)。
我如何使用此分类器来使用多个功能,而不仅仅是一个?我认为问题出在构建ArrayDataSet
时,但是我不知道它有什么问题。