我有这个dataFrames:
+----+-------+-----------+...+------+----------------+---------+
|mot1| brand| device|...|action|Column_to_modify|New_value|
+----+-------+-----------+...------+----------------+---------+
| 09| Tesla| PC|...|modify| brand| Jeep|
| 10| Tesla|SmallTablet|...|modify| brand| Jeep|
| 09| Tesla| PC|...|modify| brand| Jeep|
| 10| Tesla|SmallTablet|...|modify| mot1| 20|
| 10| Tesla|SmallTablet|...|modify| mot1| 20|
+----+-------+-----------+...+------+----------------+---------+
那么如何使用“ Column_to_modify”和“ New_value”列修改列?
我想要的是:
+----+-------+-----------+...+------+----------------+---------+
|mot1| brand| device|...|action|Column_to_modify|New_value|
+----+-------+-----------+...------+----------------+---------+
| 09| Jeep| PC|...|modify| brand| Jeep|
| 10| Jeep|SmallTablet|...|modify| brand| Jeep|
| 09| Jeep| PC|...|modify| brand| Jeep|
| 20| Tesla|SmallTablet|...|modify| mot1| 20|
| 20| Tesla|SmallTablet|...|modify| mot1| 20|
+----+-------+-----------+...+------+----------------+---------+
有什么想法吗?
答案 0 :(得分:1)
为每个列分配了UDF:
val df = List(
("09", "Tesla", "PC", "modify", "brand", "Jeep"),
("10", "Tesla", "SmallTablet", "modify", "brand", "Jeep"),
("09", "Tesla", "PC", "modify", "brand", "Jeep"),
("10", "Tesla", "SmallTablet", "modify", "mot1", "20"),
("10", "Tesla", "SmallTablet", "modify", "mot1", "20")
).toDF("mot1", "brand", "device", "action", "Column_to_modify", "New_value")
val modifyColumn = (colName: String, colValue: String, modifyColumnName: String, modifyColumnValue: String) =>
if (colName.equals(modifyColumnName)) modifyColumnValue else colValue
val modifyColumnUDF = udf(modifyColumn)
val result = df
.withColumn("mot1", modifyColumnUDF(lit("mot1"), $"mot1", $"Column_to_modify", $"New_value"))
.withColumn("brand", modifyColumnUDF(lit("brand"), $"brand", $"Column_to_modify", $"New_value"))
result.show(false)
输出:
+----+-----+-----------+------+----------------+---------+
|mot1|brand|device |action|Column_to_modify|New_value|
+----+-----+-----------+------+----------------+---------+
|09 |Jeep |PC |modify|brand |Jeep |
|10 |Jeep |SmallTablet|modify|brand |Jeep |
|09 |Jeep |PC |modify|brand |Jeep |
|20 |Tesla|SmallTablet|modify|mot1 |20 |
|20 |Tesla|SmallTablet|modify|mot1 |20 |
+----+-----+-----------+------+----------------+---------+
答案 1 :(得分:0)
一种实现此目标的快速方法是使用map
操作并将其转换为所需的数据格式,如下所示:
import org.json.JSONObject
// creating input dataframe by reading input file
val inputDF = sparkSession.read.option("header", "true").csv("my_input_file.csv")
inputDF.printSchema()
inputDF.show(false)
val resultRDD = inputDF.toJSON.rdd.map(row => {
val json = new JSONObject(row)
val columnToModify = json.getString("Column_to_modify")
val newValue = json.get("New_value")
if (json.has(columnToModify)) {
json.put(columnToModify, newValue)
}
json.toString
})
// converting the result RDD into dataframe
val finalOutputDF = sparkSession.read.json(resultRDD)
finalOutputDF.printSchema()
finalOutputDF.show(false)
,输出如下:
root
|-- mot1: string (nullable = true)
|-- brand: string (nullable = true)
|-- device: string (nullable = true)
|-- action: string (nullable = true)
|-- Column_to_modify: string (nullable = true)
|-- New_value: string (nullable = true)
+----+-----+-----------+------+----------------+---------+
|mot1|brand|device |action|Column_to_modify|New_value|
+----+-----+-----------+------+----------------+---------+
|09 |Tesla|PC |modify|brand |Jeep |
|10 |Tesla|SmallTablet|modify|brand |Jeep |
|09 |Tesla|PC |modify|brand |Jeep |
|10 |Tesla|SmallTablet|modify|mot1 |20 |
|10 |Tesla|SmallTablet|modify|mot1 |20 |
+----+-----+-----------+------+----------------+---------+
root
|-- Column_to_modify: string (nullable = true)
|-- New_value: string (nullable = true)
|-- action: string (nullable = true)
|-- brand: string (nullable = true)
|-- device: string (nullable = true)
|-- mot1: string (nullable = true)
+----------------+---------+------+-----+-----------+----+
|Column_to_modify|New_value|action|brand|device |mot1|
+----------------+---------+------+-----+-----------+----+
|brand |Jeep |modify|Jeep |PC |09 |
|brand |Jeep |modify|Jeep |SmallTablet|10 |
|brand |Jeep |modify|Jeep |PC |09 |
|mot1 |20 |modify|Tesla|SmallTablet|20 |
|mot1 |20 |modify|Tesla|SmallTablet|20 |
+----------------+---------+------+-----+-----------+----+
在第二个数据帧中观察,这些键按排序顺序排列并具有所需的输出。