Spark中的分割功能

时间:2017-10-24 18:19:19

标签: apache-spark pyspark

我想将地址拆分为两列,如streetno和streetname,比如来自客户的ex select address1

前身的地址

2719 STONE CREEK DR 并将它们存放在街道号码2719和街道名称为STONE CREEK DR。

选择regexp_extract(地址1,'(?< = \ s)。*',0),' |',splitaddress1,' [\]&表中的#39;)[0];

示例数据:

Input

enter image description here

预期产出:

Output

enter image description here

但是当我运行上面的查询时,我没有得到任何火花,但会导致Hive。

1 个答案:

答案 0 :(得分:0)

如果我理解正确的话:

>>> qry = """
... select split(addressl, '\\\s+')[0] as streetnumber,
...        regexp_replace(addressl, '^\\\d+\\\s+', '') as streetname
... from table"""
>>> spark.sql(qry).show()
+------------+-----------------+
|streetnumber|       streetname|
+------------+-----------------+
|         100|HORACE GREELEY RD|
|          55|    School Street|
|        2893|       MASHIE CIR|
|        1200|         JEWEL DR|
|         201|       W RIVER RD|
+------------+-----------------+