Question

我一直在处理这个答案link，但我有更具体的需求。

我只需要选择以“cat”开头的列。我无法确定如何根据模式选择列。我不需要过滤数据帧，只需选择名称以模式开头的列。

val transformers: Array[PipelineStage] = df.select("cat*").columns.map(
  cname =>
    new StringIndexer()
      .setInputCol(cname)
      .setOutputCol(s"${cname}_index")
  )

val stages: Array[PipelineStage] = transformers

val pipeline = new Pipeline().setStages(stages)
val model = pipeline.fit(df)

此代码产生错误：

org.apache.spark.sql.AnalysisException: cannot resolve 'cat*' given input columns: [cat3, cat7, cat25,...

Answer 1

这很简单。您只需过滤以＆＃34; cat＆＃34;开头的列。如下：

PictureBox pictureBoxRain1 = new PictureBox();
pictureBoxRain1.Size = size;
//pictureBoxRain1.Image = (Image)Properties.Resources.kaplja;
pictureBoxRain1.Image = Image.FromFile(@"C:\images\kaplja.png");
//pictureBoxRain1.ImageLocation = pictureBoxRain.I;
//pictureBoxRain1.Image = Graphics.FromImage();
//pictureBoxRain1.InitialImage = Properties.Resources.kaplja;
//pictureBoxRain1.BackgroundImage = Properties.Resources.kaplja;
pictureBoxRain1.Location = new Point(pictureBoxRain.Location.X + pictureBoxGrass.Size.Width + 10, pictureBoxRain.Location.Y);
Controls.Add(pictureBoxRain1);

Answer 2

为什么要从数据框中进行选择以获取列？为什么不过滤所有名称：

val transformers: Array[PipelineStage] = df.columns.filter(_.startsWith("cat")).map(
  cname =>
    new StringIndexer()
      .setInputCol(cname)
      .setOutputCol(s"${cname}_index")
  )

只需要处理Spark DataFrame中的特定列

2 个答案: