我有DataFrame
,其结构如下:
root
|-- very_hot: string (nullable = true)
|-- hot: string (nullable = true)
|-- cold: string (nullable = true)
|-- little_snow: string (nullable = true)
|-- medium_snow: string (nullable = true)
|-- very_cold: string (nullable = true)
|-- deep_snow: string (nullable = true)
|-- freezing: string (nullable = true)
|-- windy: string (nullable = true)
这些列中的每一个都包含True
或False
。我想用列名称为True
的数组创建一个新列。我该怎么办?
编辑: 这是我的桌子:
+--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
|very_hot| hot| cold|little_snow|medium_snow|very_cold|deep_snow|freezing|windy|
+--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
| True|False|False| False| False| False| False| False| True|
| False|False| True| True| False| False| False| False|False|
| False|False| True| False| True| False| False| False|False|
| False|False|False| False| False| True| True| False|False|
+--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
我想要的列应如下所示:
+--------------------+
| features|
+--------------------+
| very_hot, windy|
| cold, little_snow|
| cold, medium_snow|
|very_cold, deep_snow|
+--------------------+
答案 0 :(得分:0)
此Scala代码
val data = Seq((true, true, false), (true, false, true), (false, true, true))
val df = data.toDF("first", "second", "third")
val names = df.schema.map(_.name).zipWithIndex
df.rdd
.map(r => names
.filter(n => r.getBoolean(n._2))
.map(_._1)
.mkString(",")
).toDF("feature").show
将导致
+------------+
| feature|
+------------+
|first,second|
| first,third|
|second,third|
+------------+
答案 1 :(得分:0)
此代码可能对您有帮助
import org.apache.spark.sql.functions._
val df=Seq(("True","False","False","False","False","False","False","False","True"),("False","False","True","True","False","False","False","False","False"),("False","False","True","False","True","False","False","False","False"),("False","False","False","False","False","True","True","False","False")).toDF("very_hot","hot","cold","little_snow","medium_snow","very_cold","deep_snow","freezing","windy")
df.show()
/*
+--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
|very_hot| hot| cold|little_snow|medium_snow|very_cold|deep_snow|freezing|windy|
+--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
| True|False|False| False| False| False| False| False| True|
| False|False| True| True| False| False| False| False|False|
| False|False| True| False| True| False| False| False|False|
| False|False|False| False| False| True| True| False|False|
+--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
*/
val df1=df.withColumn("features", concat_ws(",",
when(col("very_hot").contains("True"), "very_hot"),
when(col("hot").contains("True"), "hot"),
when(col("cold").contains("True"), "cold"),
when(col("little_snow").contains("True"), "little_snow"),
when(col("medium_snow").contains("True"), "medium_snow"),
when(col("very_cold").contains("True"), "very_cold"),
when(col("deep_snow").contains("True"), "deep_snow"),
when(col("freezing").contains("True"), "freezing"),
when(col("windy").contains("True"), "windy")
)).drop("very_hot").drop("hot").drop("cold").drop("little_snow").drop("medium_snow").drop("very_cold").drop("deep_snow").drop("freezing").drop("windy")
df1.show()
/*
+-------------------+
| features|
+-------------------+
| very_hot,windy|
| cold,little_snow|
| cold,medium_snow|
|very_cold,deep_snow|
+-------------------+
*/
答案 2 :(得分:0)
尝试一下。
.htaccess
答案 3 :(得分:0)
另一种选择-
df2.show(false)
df2.printSchema()
/**
* +--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
* |very_hot|hot |cold |little_snow|medium_snow|very_cold|deep_snow|freezing|windy|
* +--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
* |True |False|False|False |False |False |False |False |True |
* |False |False|True |True |False |False |False |False |False|
* |False |False|True |False |True |False |False |False |False|
* |False |False|False|False |False |True |True |False |False|
* +--------+-----+-----+-----------+-----------+---------+---------+--------+-----+
*
* root
* |-- very_hot: string (nullable = true)
* |-- hot: string (nullable = true)
* |-- cold: string (nullable = true)
* |-- little_snow: string (nullable = true)
* |-- medium_snow: string (nullable = true)
* |-- very_cold: string (nullable = true)
* |-- deep_snow: string (nullable = true)
* |-- freezing: string (nullable = true)
* |-- windy: string (nullable = true)
*/
val columns = df2.columns.map(c => s"named_struct('name', '$c', 'value', `$c`)").mkString(", ")
df2.selectExpr(s"TRANSFORM(FILTER(array($columns), x -> x.value='True'), x -> x.name) as features")
.show(false)
/**
* +----------------------+
* |features |
* +----------------------+
* |[very_hot, windy] |
* |[cold, little_snow] |
* |[cold, medium_snow] |
* |[very_cold, deep_snow]|
* +----------------------+
*/