如何在DataFrame Spark中的cloumn中进行修剪操作

时间:2016-05-12 10:13:52

标签: apache-spark-sql spark-dataframe

我想从时间戳修剪时间(数据框中的一列)并仅获取小时值并存储在数据框的新列中。请帮忙

3 个答案:

答案 0 :(得分:6)

这应该有效:

val DF2 = DF1.withColumn("col_1", trim(DF1("col_1")))

答案 1 :(得分:1)

您可以使用其中一个可用于列操作的函数:

对于Scala:

import org.apache.spark.sql.functions._
val df2 = df.withColumn("hour", hour(col("timestamp_column")))

对于Python:

from pyspark.sql.functions import *
df2 = df.withColumn('hour', hour(col('timestamp_column')))

参考:

答案 2 :(得分:1)

希望这会有所帮助

val df = Seq((" Virat ",18,"RCB"),("Rohit ",45,"MI "),(" DK",67,"KKR ")).toDF("captains","jersey_number","teams")

scala> df.show

+--------+-------------+-----+
|captains|jersey_number|teams|
+--------+-------------+-----+
|  Virat |           18|  RCB|
|  Rohit |           45|  MI |
|      DK|           67| KKR |
+--------+-------------+-----+

scala>val trimmedDF = df.withColumn("captains",trim(df("captains"))).withColumn("teams",trim(df("teams")))

scala> trimmedDF.show

+--------+-------------+-----+
|captains|jersey_number|teams|
+--------+-------------+-----+
|   Virat|           18|  RCB|
|   Rohit|           45|   MI|
|      DK|           67|  KKR|
+--------+-------------+-----+