使用AWS胶水创建AWS S3中存在的.dat.gz文件的Spark数据帧

时间:2019-01-29 14:19:13

标签: python amazon-web-services apache-spark amazon-s3 aws-glue

我编写了一个pyspark代码,该代码在AWS胶水中运行,并试图读取一个dat.gz文件。数据帧已成功创建,但是Trim(BOTH FROM)已添加到列名。下面是我的代码段。

df = spark.read.format("csv").option("header", 'false').option("delimiter", '|').load("s3://xxxxxx/xxxx/xxxxx/xxx/xxxxxxxxxx.dat.gz")

输出


+----------------------+------------------------+-------------------------+-------------------------+--------------------------+----------------------+----------------------------+----------------------------+----------------------------------+--------------------------------+--------------------------+---------------------------+-----------------------+-----------------------+--------------------------+-------------------------+---------------------------+------------------------+-------------------------+-----------------------+-----------------------+--------------------------+---------------------------+
|Trim(BOTH FROM EFF_DT)|Trim(BOTH FROM SITE_NUM)|Trim(BOTH FROM ARTCL_NUM)|Trim(BOTH FROM SL_UOM_CD)|Trim(BOTH FROM COND_TY_CD)|Trim(BOTH FROM EXP_DT)|Trim(BOTH FROM COND_REC_NUM)|Trim(BOTH FROM MAIN_SCAN_CD)|Trim(BOTH FROM PRC_COND_PRRTY_NUM)|Trim(BOTH FROM PRC_COND_WIN_IND)|Trim(BOTH FROM PRC_RSN_CD)|Trim(BOTH FROM PRC_METH_CD)|Trim(BOTH FROM PRC_AMT)|Trim(BOTH FROM PRC_QTY)|Trim(BOTH FROM UT_PRC_AMT)|Trim(BOTH FROM PROMO_NUM)|Trim(BOTH FROM BNS_BUY_NUM)|Trim(BOTH FROM CURRN_CD)|Trim(BOTH FROM BBY_TY_CD)|Trim(BOTH FROM BBY_AMT)|Trim(BOTH FROM BBY_PCT)|Trim(BOTH FROM BBY_LEV_CD)|Trim(BOTH FROM BBY_PRC_QTY)|
+----------------------+------------------------+-------------------------+-------------------------+--------------------------+----------------------+----------------------------+----------------------------+----------------------------------+--------------------------------+--------------------------+---------------------------+-----------------------+-----------------------+--------------------------+-------------------------+---------------------------+------------------------+--

但是当读取任何其他文件时,我得到了正确的输出。 谁可以帮我这个事? 这不是文件问题,因为我在本地计算机上尝试了相同的代码,并且运行正常。

0 个答案:

没有答案