我正在尝试解析一些文件并将数据放到表中:
File = "somehtml.file"
Data = spark.read.text(File)
df_file = Data.select(regexp_extract("col1", '(.*?)', 0).alias("somedata"), \
regexp_extract("col1", '(.*?)', 0).alias("somedata2"))
之后我没有正确的结果:
+--------------------+--------------------+
| somedata| somedata2|
+--------------------+--------------------+
|http://sweersdsh.ru....| |
| |helo my name lololol...|
| | |
| | |
|http://qweuiewjk.ru....| |
| |helo my name alallal...|
我需要这个:
+--------------------+--------------------+
| somedata| somedata2|
+--------------------+--------------------+
|http://sweersdsh.ru....|helo my name lololol...|
|http://qweuiewjk.ru....|helo my name alallal...|
这个''
,请帮帮我