根据我对spark sql的调查,了解到超过2个表不能直接加入,我们必须使用子查询才能使其工作。所以我使用子查询并能够加入3个表:
以下查询:
" SELECT name,age,gender,dpi.msisdn,subscriptionType, maritalStatus,isHighARPU,ipAddress,startTime,endTime,isRoaming, dpi.totalCount,dpi.website FROM(SELECT subsc.name,subsc.age, subsc.gender,subsc.msisdn,subsc.subscriptionType, subsc.maritalStatus,subsc.isHighARPU,cdr.ipAddress,cdr.startTime, cdr.endTime,cdr.isRoaming FROM SUBSCRIBER_META subsc,CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming =' Y')temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn&#34 ;;
但是当处于相同的模式时,我正在尝试加入4个表,它会让我跟踪异常
java.lang.RuntimeException:[1.517]失败:预期的标识符
查询加入4个表:
SELECT name,dueAmount FROM(SELECT name,age,gender,dpi.msisdn, subscriptionType,maritalStatus,isHighARPU,ipAddress,startTime, endTime,isRoaming,dpi.totalCount,dpi.website FROM(SELECT subsc.name,subsc.age,subsc.gender,subsc.msisdn, subsc.subscriptionType,subsc.maritalStatus,subsc.isHighARPU, cdr.ipAddress,cdr.startTime,cdr.endTime,cdr.isRoaming FROM SUBSCRIBER_META subsc,CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming =' Y')temp,DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn)inner,BILLING_META结算,其中inner.msisdn = billing.msisdn
任何人都可以帮助我使这个查询工作吗?
提前致谢。错误如下:
09/02/2015 02:55:24 [ERROR] org.apache.spark.Logging$class: Error running job streaming job 1423479307000 ms.0
java.lang.RuntimeException: [1.517] failure: identifier expected
SELECT name, dueAmount FROM (SELECT name, age, gender, dpi.msisdn, subscriptionType, maritalStatus, isHighARPU, ipAddress, startTime, endTime, isRoaming, dpi.totalCount, dpi.website FROM (SELECT subsc.name, subsc.age, subsc.gender, subsc.msisdn, subsc.subscriptionType, subsc.maritalStatus, subsc.isHighARPU, cdr.ipAddress, cdr.startTime, cdr.endTime, cdr.isRoaming FROM SUBSCRIBER_META subsc, CDR_FACT cdr WHERE subsc.msisdn = cdr.msisdn AND cdr.isRoaming = 'Y') temp, DPI_FACT dpi WHERE temp.msisdn = dpi.msisdn) inner, BILLING_META billing where inner.msisdn = billing.msisdn
^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:73)
at org.apache.spark.sql.api.java.JavaSQLContext.sql(JavaSQLContext.scala:49)
at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:596)
at com.hp.tbda.rta.examples.JdbcRDDStreaming5$7.call(JdbcRDDStreaming5.java:546)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:274)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:527)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)