如何从列表到PySpark中的查询以逗号分隔的字符串?

时间:2019-03-21 20:25:18

标签: pyspark pyspark-sql

我想通过使用PySpark中的列表来生成查询

list = ["hi@gmail.com", "goodbye@gmail.com"]
query = "SELECT * FROM table WHERE email IN (" + list + ")"

这是我想要的输出:

query
SELECT * FROM table WHERE email IN ("hi@gmail.com", "goodbye@gmail.com")

相反,我得到:TypeError: cannot concatenate 'str' and 'list' objects

有人可以帮助我实现这一目标吗?谢谢

2 个答案:

答案 0 :(得分:0)

如果有人遇到相同的问题,我发现您可以使用以下代码:

"'"+"','".join(map(str, emails))+"'"

,您将获得以下输出:

SELECT * FROM table WHERE email IN ('hi@gmail.com', 'goodbye@gmail.com')

答案 1 :(得分:0)

尝试一下:

基于数据框的方法-

df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])

email_filter_list = ["hi@gmail.com", "goodbye@gmail.com"]

df.where(col('email_id').isin(email_filter_list)).show()

基于Spark SQL的方法-

df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])
df.createOrReplaceTempView('t1')

sql_filter  = ','.join(["'" +i + "'" for i in email_filter_list])

spark.sql("SELECT * FROM t1 WHERE email_id IN ({})".format(sql_filter)).show()