使用两个select语句连接两个或多个表时使用Spark sql

时间:2015-10-16 06:49:22

标签: apache-spark apache-spark-sql

这是我的陈述:

val Porders = sqlContext.sql(
    """SELECT count(STATUS_CD) 
    FROM s_order 
    WHERE STATUS_CD = 'pending' AND ROW_ID IN 
        ( SELECT so.ROW_ID FROM s_order so 
        JOIN s_order_item soi 
        ON so.ROW_ID = soi.ORDER_ID 
        JOIN s_order_type sot 
        ON so.ORDER_TYPE_ID = sot.ROW_ID 
        JOIN s_product sp 
        ON soi.PROD_ID = sp.ROW_ID
        WHERE (sp.NAME like '%VIP%' OR sp.NAME like '%BIZ%' OR sp.NAME like '%UniFi%') 
        AND LOWER(sot.NAME) = 'new install')
    """)

我收到以下错误:

ERROR : java.lang.RuntimeException: [3.3] failure: identifier expected
( SELECT so.ROW_ID FROM s_order so JOIN s_order_item soi 
  ^

可能是什么原因?

1 个答案:

答案 0 :(得分:1)

为什么会发生这种情况,原因是不支持子查询:请参阅Spark-4226

甚至像

这样的查询
sqlContext.sql(
  """SELECT count(STATUS_CD)
     FROM s_order
     WHERE STATUS_CD = 'pending' AND ROW_ID IN
       (SELECT * FROM s_order)
  """)

目前不起作用(谈到Spark SQL 1.5.1)

尝试通过联接替换子查询,例如https://dev.mysql.com/doc/refman/5.1/en/rewriting-subqueries.html