Question

我有一个标记数据元素的向量，如下所示：

[label1: 1.1, label2: 2.43, label3: 0.5]

[label1: 0.1, label2: 2.0, label3: 1.0]

可以有任意数量的元素，其中每个元素基本上对应于一行数据。我正在尝试将其解析为带有列标题的CSV，如下所示：

label1 label2 label3
1.1    2.43   0.5
0.1    2.0    1.0

我一直在使用StringBuilder()构造函数，并且更愿意坚持使用它，但如果有必要，我可以使用其他东西。

除了将标题与数字结果的第一行分开之外，我几乎完成了这项工作。

我有一个遍历数组元素（“行”）的外部循环和遍历每个数组元素（“列”）的每个部分的内部循环，其中在上面的例子中我们有2个“行”（元素）和3“列”（成员索引）。

我的代码看起来像这样（下面的块都创建了CSV并打印到屏幕上）：

StringBuilder builder  = new StringBuilder();

// Write predictions to file
for (int i = 0; i < labeled.size(); i++)      
{
    // Discreet prediction
    double predictionIndex = 
        clf.classifyInstance(newTest.instance(i)); 

    // Get the predicted class label from the predictionIndex.
    String predictedClassLabel =
        newTest.classAttribute().value((int) predictionIndex);

    // Get the prediction probability distribution.
    double[] predictionDistribution = 
        clf.distributionForInstance(newTest.instance(i)); 

    // Print out the true predicted label, and the distribution
    System.out.printf("%5d: predicted=%-10s, distribution=", 
                      i, predictedClassLabel); 

    // Loop over all the prediction labels in the distribution.
    for (int predictionDistributionIndex = 0; 
         predictionDistributionIndex < predictionDistribution.length; 
         predictionDistributionIndex++)
    {
        // Get this distribution index's class label.
        String predictionDistributionIndexAsClassLabel = 
            newTest.classAttribute().value(
                predictionDistributionIndex);

        // Get the probability.
        double predictionProbability = 
            predictionDistribution[predictionDistributionIndex];

        System.out.printf("[%10s : %6.3f]", 
                          predictionDistributionIndexAsClassLabel, 
                          predictionProbability );
        if(i == 0){
            builder.append(predictionDistributionIndexAsClassLabel+",");

            if(predictionDistributionIndex == predictionDistribution.length){
                builder.append("\n");
            }
        }
        // Add probabilities as rows     
        builder.append(predictionProbability+",");

        }

    System.out.printf("\n");
    builder.append("\n");

}

结果目前如下：

setosa,1.0,versicolor,0.0,virginica,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,
1.0,0.0,0.0,

其中setosa，versicolor和virginica是标签。你可以看到它从第二行开始工作，但我无法弄清楚如何修复第一行。

Answer 1

如果我正确理解你的问题，你会在内部for循环中同时获得第一行的标签和值，因此它们会随之附加。如果要将标签分开，可以对内循环部分进行一些更改，如下所示：

StringBuilder labelRow = new StringBuilder();

    // Loop over all the prediction labels in the distribution.
    for (int predictionDistributionIndex = 0; 
         predictionDistributionIndex < predictionDistribution.length; 
         predictionDistributionIndex++)
    {
        // Get this distribution index's class label.
        String predictionDistributionIndexAsClassLabel = 
            newTest.classAttribute().value(
                predictionDistributionIndex);

        // Get the probability.
        double predictionProbability = 
            predictionDistribution[predictionDistributionIndex];

        System.out.printf("[%10s : %6.3f]", 
                          predictionDistributionIndexAsClassLabel, 
                          predictionProbability );
        if(i == 0){
            labelRow.append(predictionDistributionIndexAsClassLabel+",");

            if(predictionDistributionIndex == predictionDistribution.length){
                builder.append("\n");
            }

        }

        // Add probabilities as rows     
        builder.append(predictionProbability+",");

     }
     if(i == 0){
          builder.insert(0,labelRow.toString()+"\n");
     }

它的作用是在单独的StringBuilder中收集标签，之后您可以在最终builder值的开头插入标签。

使用StringBuilder（）将数组解析为带有标题的CSV - 标题行的问题

1 个答案: