pyspark删除并合并行

时间:2018-06-01 18:35:12

标签: parsing pyspark

我正在尝试解析一些文件并将数据放到表中:

File = "somehtml.file"  
Data = spark.read.text(File)

df_file = Data.select(regexp_extract("col1", '(.*?)', 0).alias("somedata"), \
                regexp_extract("col1", '(.*?)', 0).alias("somedata2"))

之后我没有正确的结果:

+--------------------+--------------------+
|            somedata|           somedata2|
+--------------------+--------------------+
|http://sweersdsh.ru....|                    |
|                    |helo my name lololol...|
|                    |                    |
|                    |                    |
|http://qweuiewjk.ru....|                    |
|                    |helo my name alallal...|

我需要这个:

+--------------------+--------------------+ | somedata| somedata2| +--------------------+--------------------+ |http://sweersdsh.ru....|helo my name lololol...| |http://qweuiewjk.ru....|helo my name alallal...|

这个'',请帮帮我

0 个答案:

没有答案