我需要使用之前在spark中创建的变量从teradata表中选择数据:
%spark
sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
val query = "select distinct cod_contrato from xxx.contratos"
val df = sqlContext.sql(query)
val dfv = df.select("cod_contrato")
变量是一个字符串。
所以我想查询使用字符串向量的数据:
如果我使用:
%spark
val sql = s"(SELECT * FROM xx2.CONTRATOS where cod_contrato in '$dfv') as query"
我得到:
(SELECT * FROM xx2.CONTRATOS where cod_contrato in '[cod_contrato: string]') as query
所需的结果将是:
SELECT * FROM xx2.CONTRATOS where cod_contrato in ('11111', '11112' )
如何将向量转换为由()包围并在每个元素中加引号的列表?
谢谢
答案 0 :(得分:0)
这是我的审判。在某些数据框中,
val test = df.select("id").as[String].collect
> test: Array[String] = Array(6597, 8011, 2597, 5022, 5022, 6852, 6852, 5611, 14838, 14838, 2588, 2588)
,因此测试现在是数组。因此,通过使用mkString,
val sql = s"SELECT * FROM xx2.CONTRATOS where cod_contrato in " + test.mkString("('", "','", "')") + " as query"
> sql: String = SELECT * FROM xx2.CONTRATOS where cod_contrato in ('6597','8011','2597','5022','5022','6852','6852','5611','14838','14838','2588','2588') as query
现在最终结果是字符串。
答案 1 :(得分:0)
临时查看要过滤的值,然后在查询中引用它
%spark
sqlContext.setConf("spark.sql.parquet.writeLegacyFormat", "true")
val query = "select distinct cod_contrato from xxx.contratos"
sqlContext.sql(query).selectExpr("cast(cod_contrato as string)").createOrReplaceTempView("dfv_table"")
val sql = "(SELECT * FROM xx2.CONTRATOS where cod_contrato in (select * from dfv_table)) as query"
这将适用于spark sql中的查询,但不会返回查询字符串。如果您想要的只是将查询作为字符串,则Lamanus的答案应该足够