我有一个没有标题名称的数据框。我想删除最后一列记录,但没有传递列名。 有没有办法做到这一点?
df.drop("colname")
不是在这里传递列名,而是如何从数据帧中删除最后一列。
答案 0 :(得分:1)
使用df.schema
解析最后一列时使用相同的API:
df.drop(df.schema.last.name)
答案 1 :(得分:1)
scala中的另一个选项
df.drop(df.columns(df.columns.length -1))
用于删除最后一列
df.drop(df.columns(0))
用于删除第一列
这里是完整的示例:
val mycsv =
"""
||TemperatureF|Date|timestamp|MinTemp|MaxTemp|
|| 28.0| 01/01/2000 6:53 AM|946709580| 28.0| 37.4|
|| 28.0| 01/01/2000 7:53 AM|946713180| 28.0| 37.4|
|| 28.0| 01/01/2000 8:53 AM|946716780| 28.0| 37.4|
|| 30.2|01/01/2000 10:24 PM|946765440| 30.2| 37.4|
|| 30.9|01/01/2000 10:53 PM|946767180| 30.9| 37.4|
|| 37.4| 01/02/2000 4:39 AM|946787940| 28.0| 37.4|
|| 36.0| 01/02/2000 4:53 AM|946788780| 28.0| 36.0|
|| 36.0| 01/02/2000 5:53 AM|946792380| 28.0| 36.0|
""".stripMargin('|').lines.toList.toDS()
val df = spark.read.option("header", true).option("sep", "|").option("inferSchema", true).csv(mycsv)
println("original schema with first and last extra columns")
df.printSchema
val afterfirstAndLastDF = df
.drop(df.columns(df.columns.length - 1)) // drop last column
.drop(df.columns(0)) // drop first column
afterfirstAndLastDF.show()
afterfirstAndLastDF.printSchema
结果:
original schema with first and last extra columns
root
|-- _c0: string (nullable = true)
|-- TemperatureF: double (nullable = true)
|-- Date: string (nullable = true)
|-- timestamp: integer (nullable = true)
|-- MinTemp: double (nullable = true)
|-- MaxTemp: double (nullable = true)
|-- _c6: string (nullable = true)
+------------+-------------------+---------+-------+-------+
|TemperatureF| Date|timestamp|MinTemp|MaxTemp|
+------------+-------------------+---------+-------+-------+
| 28.0| 01/01/2000 6:53 AM|946709580| 28.0| 37.4|
| 28.0| 01/01/2000 7:53 AM|946713180| 28.0| 37.4|
| 28.0| 01/01/2000 8:53 AM|946716780| 28.0| 37.4|
| 30.2|01/01/2000 10:24 PM|946765440| 30.2| 37.4|
| 30.9|01/01/2000 10:53 PM|946767180| 30.9| 37.4|
| 37.4| 01/02/2000 4:39 AM|946787940| 28.0| 37.4|
| 36.0| 01/02/2000 4:53 AM|946788780| 28.0| 36.0|
| 36.0| 01/02/2000 5:53 AM|946792380| 28.0| 36.0|
+------------+-------------------+---------+-------+-------+
root
|-- TemperatureF: double (nullable = true)
|-- Date: string (nullable = true)
|-- timestamp: integer (nullable = true)
|-- MinTemp: double (nullable = true)
|-- MaxTemp: double (nullable = true)