在列表的每个元素中添加引号

时间:2019-07-29 13:30:59

标签: scala apache-spark apache-spark-sql

我需要使用之前在spark中创建的变量从teradata表中选择数据:

%spark
sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
val query = "select distinct cod_contrato from xxx.contratos"
val df = sqlContext.sql(query)
val dfv = df.select("cod_contrato")

变量是一个字符串。

所以我想查询使用字符串向量的数据:

如果我使用:

%spark

val sql = s"(SELECT * FROM xx2.CONTRATOS where cod_contrato in '$dfv') as query"

我得到:

(SELECT * FROM xx2.CONTRATOS where cod_contrato in '[cod_contrato: string]') as query

所需的结果将是:

SELECT * FROM xx2.CONTRATOS where cod_contrato in ('11111', '11112' )

如何将向量转换为由()包围并在每个元素中加引号的列表?

谢谢

2 个答案:

答案 0 :(得分:0)

这是我的审判。在某些数据框中,

val test = df.select("id").as[String].collect
> test: Array[String] = Array(6597, 8011, 2597, 5022, 5022, 6852, 6852, 5611, 14838, 14838, 2588, 2588)

,因此测试现在是数组。因此,通过使用mkString,

val sql = s"SELECT * FROM xx2.CONTRATOS where cod_contrato in " + test.mkString("('", "','", "')") + " as query"
> sql: String = SELECT * FROM xx2.CONTRATOS where cod_contrato in ('6597','8011','2597','5022','5022','6852','6852','5611','14838','14838','2588','2588') as query

现在最终结果是字符串。

答案 1 :(得分:0)

临时查看要过滤的值,然后在查询中引用它

%spark
sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
val query = "select distinct cod_contrato from xxx.contratos"
sqlContext.sql(query).selectExpr("cast(cod_contrato as string)").createOrReplaceTempView("dfv_table"")

val sql = "(SELECT * FROM xx2.CONTRATOS where cod_contrato in (select * from dfv_table)) as query"

这将适用于spark sql中的查询,但不会返回查询字符串。如果您想要的只是将查询作为字符串,则Lamanus的答案应该足够