Question

我在HIVE中有一个AVRO格式表。该表中的一列（字符串数据类型）包含带有换行符的数据，因此当我选择（使用beeline或pyspark）时，会得到多行。我确实在我的选择中尝试了选项REGEXP_REPLACE（col1，“ \ n”，“”），但它仍返回多行。

当我在文本编辑器中复制并粘贴时，col1的值如下所示：

NY - Enjoy holidays or Enjoy leaves.  
Silver 2000 plan
Silver 2000 plan CSR 1
Silver 2000 plan CSR 2
Gold 600 plan
Enjoy, holidays then leaves for ER, UC and old age only.  Primary holidays not subject to Enjoy.

这里有什么替代品吗？

Answer 1

尝试

regexp_replace(col1, '\\\\n', "")

示例

hive> select * from temp.test4;
OK
1   abc\nxyz
Time taken: 0.169 seconds, Fetched: 1 row(s)
hive> select id, regexp_replace(value, '\\\\n', "") from temp.test4;
OK
1   abcxyz

Answer 2

使用regexp_replace（regexp_replace（col1，'\ r'，''），'\ n'，''）解析

从HIVE表中选择时如何替换换行符

2 个答案: