我正在从具有 json 数据的文本文件创建一个数据框(df)。创建数据框后看起来像这样。
+------------------------------------------------------------------------------------+
|data |
+------------------------------------------------------------------------------------+
|"{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"}" |
|"{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PASEO COSTA DEL SUR","State":"PR"}"|
+------------------------------------------------------------------------------------+
我想去掉列数据开头和结尾的双引号。所以最终的数据框应该是这样的
+------------------------------------------------------------------------------------+
|data |
+------------------------------------------------------------------------------------+
|{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"} |
|{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PASEO COSTA DEL SUR","State":"PR"} |
+------------------------------------------------------------------------------------+
下面是我写的从开头删除双引号的代码
df = df.withColumn('data1', F.regexp_replace("data",'^\"{\"','{\"'))
但我收到此错误
^"{" ^ 在 java.util.regex.Pattern.error(Pattern.java:1957)
你能帮我解决这个问题吗?
答案 0 :(得分:2)
你只需要稍微调整一下你的正则表达式。不需要转义引号,但需要转义大括号:
df2 = df.withColumn('data1', F.regexp_replace("data",'^"\{"','{"'))
df2.show(truncate=False)
+------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+
|data |data1 |
+------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+
|"{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"}" |{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PARC PARQUE","State":"PR"}" |
|"{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PASEO COSTA DEL SUR","State":"PR"}"|{"Zipcode":704,"ZipCodeType":"STANDARD","City":"PASEO COSTA DEL SUR","State":"PR"}"|
+------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+