我尝试通过选择小时+分钟/ 60以及数据框中的其他列来创建新数据框,如下所示:
dict[num] = count+1
我收到了以下错误:
val logon11 = logon1.select("User","PC","Year","Month","Day","Hour","Minute",$"Hour"+$"Minute"/60)
也许我知道原因是我无法使用" select"来获取这些类型的DataFrame。同时。那我怎么能得到这样的数据帧?
答案 0 :(得分:8)
DF的select
方法接受所有String
s或所有org.apache.spark.sql.Column
类型的参数,但不同时使用两者的混合。
在您的情况下,您将String
和Column
类型参数传递给select
方法。
val logon11 = logon1.select($"User",$"PC",$"Year",$"Month",$"Day",$"Hour",$"Minute",$"Hour"+$"Minute"/60 as "total_hours")
希望它有所帮助!
答案 1 :(得分:3)
您可以使用withColumn
从现有列创建新列,或根据以下某些条件
val logon1 = Seq(("User1","PC1",2017,2,12,12,10)).toDF("User","PC","Year","Month","Day","Hour","Minute")
val logon11 = logon1.withColumn("new_col", $"Hour"+$"Minute"/60)
logon11.printSchema()
logon11.show
输出:
root
|-- User: string (nullable = true)
|-- PC: string (nullable = true)
|-- Year: integer (nullable = false)
|-- Month: integer (nullable = false)
|-- Day: integer (nullable = false)
|-- Hour: integer (nullable = false)
|-- Minute: integer (nullable = false)
|-- new_col: double (nullable = true)
+-----+---+----+-----+---+----+------+------------------+
| User| PC|Year|Month|Day|Hour|Minute| new_col|
+-----+---+----+-----+---+----+------+------------------+
|User1|PC1|2017| 2| 12| 12| 10|12.166666666666666|
+-----+---+----+-----+---+----+------+------------------+