什么是SparkSQL中Redshift的posix运算符的替代品?

时间:2019-10-24 09:31:27

标签: apache-spark apache-spark-sql pyspark-sql

我正在使用POSIX运算符(〜)执行以下Redshift SQL命令以进行模式匹配(如果字符串中的任意位置连续9位,则返回true,否则返回false)

for (var row of rows) {    
  delete row['isTrue'];  // this will delete the isTrue key from the object
  delete row['isAvailable'];  // this will delete the isAvailable key from the object
}
select '123456789' ~ '\\d{9}' as val;  --TRUE
select 'abcd123456789' ~ '\\d{9}' as val;  --TRUE

如何在SparkSQL中进行一些相同的模式匹配?

1 个答案:

答案 0 :(得分:0)

我相信rlike应该可以解决问题:

spark.sql("""SELECT '123456789' rlike '\\\d{9}' as val""").show()
spark.sql("""SELECT 'ab123456789' rlike '\\\d{9}' as val""").show()
spark.sql("""SELECT '123456789abcd' rlike '\\\d{9}' as val""").show()

所有结果:

+----+
| val|
+----+
|true|
+----+

并且:

spark.sql("""SELECT '12345678abcd' rlike '\\\d{9}' as val""").show()

结果:

+-----+
|  val|
+-----+
|false|
+-----+