PySpark-CSV到镶木地板列标题(特殊字符“ /”)错误

时间:2019-03-14 18:52:53

标签: pyspark

我在运行此程序时遇到问题-可以很好地处理Track Number,但Transaction / Time具有特殊字符且失败-如何处理。

我正在寻找的是替换行并定义我自己的标题。 或从“标题”行中删除特殊字符。

谢谢!

from pyspark.sql import SparkSession
import traceback
from pyspark.sql.functions import *
import csv
import os

def create_parquet():
    try:
        spark = SparkSession.builder.appName("prc_conversiontoparquet").getOrCreate()
        df = spark.read.option("header", True).option("delimiter", ",").option("multiLine", "true").\
        csv("s3://ert-opp-uw2-external-data-dev/sftp_data/merchant_1.csv")
        df.createOrReplaceTempView("input")
        query = """                                                                                            
            SELECT                                        
                string(`Transaction/Time`) as Transaction_Time 
                string('Track Count') as Track_Count
            FROM input                           
        """
        print(query)
        result = spark.sql(query)
        result.repartition(100).write.mode('overwrite').parquet("s3://ert-opp-uw2-external-data-dev/sftp_data/parquet_files/")

    except Exception as e:
        print(traceback.format_exc())
        print("Error Occurred")
        print(e)


create_parquet()

0 个答案:

没有答案