将具有StructField类型的库伦添加到pyspark中的数据框

时间:2018-11-02 06:34:14

标签: apache-spark dataframe pyspark schema

我的输入数据如下-

Customer_ID,General,General

Channel,Nominal,Character

WeekDateSunday,Discrete,Numeric

RevenueWeekN01,Continuous,Numeric

RevenueWeekN02,Continuous,Numeric

RevenueWeekN03,Continuous,Numeric

RevenueWeekN04,Continuous,Numeric

RevenueWeekN05,Continuous,Numeric

RevenueWeekN06,Continuous,Numeric

RevenueWeekN07,Continuous,Numeric

RevenueWeekN08,Continuous,Numeric

我只需要添加一列就需要以下数据(此列是基于第3列的structField):

Customer_ID,General,General, StructFieldType 

Channel,Nominal,Character, StructField(Channel,StringType(), True) 

WeekDateSunday,Discrete,Numeric, StructField(WeekDateSunday,DoubleType(), True) 

RevenueWeekN01,Continuous,Numeric, StructField(RevenueWeekN01,DoubleType(), True) 

RevenueWeekN02,Continuous,Numeric, StructField(RevenueWeekN02,DoubleType(), True) 

RevenueWeekN03,Continuous,Numeric, StructField(RevenueWeekN03,DoubleType(), True) 

RevenueWeekN04,Continuous,Numeric, StructField(RevenueWeekN04,DoubleType(), True) 

RevenueWeekN05,Continuous,Numeric, StructField(RevenueWeekN05,DoubleType(), True) 

RevenueWeekN06,Continuous,Numeric, StructField(RevenueWeekN06,DoubleType(), True) 

RevenueWeekN07,Continuous,Numeric StructField(RevenueWeekN06,DoubleType(), True) 

RevenueWeekN08,Continuous,Numeric StructField(RevenueWeekN06,DoubleType(), True)

以下是我使用的代码,对吗?

data_type.withColumn('structformat',when(col("Description") == 'Numeric', StructField(col("Field_Name"),DoubleType(), True)).otherwise(StructField(col("Field_Name"),StringType(), True)).show()

执行时抛出以下错误-

AssertionError: field name should be string

1 个答案:

答案 0 :(得分:-1)

错误可能是您使用单引号将其更改为双引号,并且可以消除错误

data_type.withColumn("structformat",when(col("Description") == "Numeric", StructField(col("Field_Name"),DoubleType(), True)).otherwise(StructField(col("Field_Name"),StringType(), True)).show()

仍然遇到任何问题时请发表评论,如果有帮助,请批准答案。

编辑:

Customer_ID,General,General, StructFieldType 

Channel,Nominal,Character, StructField("Channel",StringType(), True) 

WeekDateSunday,Discrete,Numeric, StructField("WeekDateSunday",DoubleType(), True) 

RevenueWeekN01,Continuous,Numeric, StructField("RevenueWeekN01",DoubleType(), True) 

RevenueWeekN02,Continuous,Numeric, StructField("RevenueWeekN02",DoubleType(), True) 

RevenueWeekN03,Continuous,Numeric, StructField("RevenueWeekN03",DoubleType(), True) 

RevenueWeekN04,Continuous,Numeric, StructField("RevenueWeekN04",DoubleType(), True) 

RevenueWeekN05,Continuous,Numeric, StructField("RevenueWeekN05",DoubleType(), True) 

RevenueWeekN06,Continuous,Numeric, StructField("RevenueWeekN06",DoubleType(), True) 

RevenueWeekN07,Continuous,Numeric StructField("RevenueWeekN06",DoubleType(), True) 

RevenueWeekN08,Continuous,Numeric StructField("RevenueWeekN06",DoubleType(), True)

尝试一次