PySpark错误''py4j.protocol.Py4JJavaError:调用o175.withColumn时发生错误。''

时间:2019-11-07 22:27:01

标签: python apache-spark pyspark

我正在尝试使用withColumn函数将spark数据框中的列从中间的某个位置移到第一列。

下面是我的PySpark代码:

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

conf = SparkConf()
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)

df_train = spark.createDataFrame([("a", 1, 2), ("a", 1, 2), ("a", 1, 3), ("a", 2, 4), ("b",  3, 5), ("c", 4, 6)], ["C1", "C2", "show_status"])
df_train.show()

columns_without_label = df_train.drop('show_status').columns
print(columns_without_label, type(columns_without_label))

for col_name in columns_without_label:
    df_train_new = df_train_new.withColumn(col_name, df_train[col_name])

df_train_new.show()

以下是我得到的错误信息:

File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o175.withColumn.
: org.apache.spark.sql.AnalysisException: Resolved attribute(s) C2#1L missing from show_status#2L in operator !Project [show_status#2L, (C2#1L + cast(2 as bigint)) AS C2#21L].;;
!Project [show_status#2L, (C2#1L + cast(2 as bigint)) AS C2#21L]
+- Project [show_status#2L]
   +- LogicalRDD [C1#0, C2#1L, show_status#2L], false

0 个答案:

没有答案