我想从时间戳修剪时间(数据框中的一列)并仅获取小时值并存储在数据框的新列中。请帮忙
答案 0 :(得分:6)
这应该有效:
val DF2 = DF1.withColumn("col_1", trim(DF1("col_1")))
答案 1 :(得分:1)
您可以使用其中一个可用于列操作的函数:
对于Scala:
import org.apache.spark.sql.functions._
val df2 = df.withColumn("hour", hour(col("timestamp_column")))
对于Python:
from pyspark.sql.functions import *
df2 = df.withColumn('hour', hour(col('timestamp_column')))
参考:
答案 2 :(得分:1)
希望这会有所帮助
val df = Seq((" Virat ",18,"RCB"),("Rohit ",45,"MI "),(" DK",67,"KKR ")).toDF("captains","jersey_number","teams")
scala> df.show
+--------+-------------+-----+
|captains|jersey_number|teams|
+--------+-------------+-----+
| Virat | 18| RCB|
| Rohit | 45| MI |
| DK| 67| KKR |
+--------+-------------+-----+
scala>val trimmedDF = df.withColumn("captains",trim(df("captains"))).withColumn("teams",trim(df("teams")))
scala> trimmedDF.show
+--------+-------------+-----+
|captains|jersey_number|teams|
+--------+-------------+-----+
| Virat| 18| RCB|
| Rohit| 45| MI|
| DK| 67| KKR|
+--------+-------------+-----+