我在我的机器上安装了Cassandra和Spark与SparkSQL。 Spark SQL支持JOIN关键字
支持的Spark SQL语法以下语法定义SELECT 查询。
SELECT [DISTINCT] [列名] | [通配符] FROM [kesypace name。]表名[JOIN子句表名ON连接条件] [WHERE 条件] [GROUP BY列名] [HAVING条件] [ORDER BY列 姓名[ASC | DSC]
我有以下代码
SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
CassandraConnector connector = CassandraConnector.apply(sc.getConf());
Session session = connector.openSession();
ResultSet results;
String sql ="";
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
sql = "SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID ALLOW FILTERING;";
results = session.execute(sql);
我收到以下错误
引起:com.datastax.driver.core.exceptions.SyntaxError:第1:25行 缺少EOF,',' (SELECT * from siem.report [,] siem ...)11:14 AM com.datastax.driver.core.exceptions.SyntaxError:行1:25缺少EOF at',' (SELECT * from siem.report [,] siem ...)at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58) 在 com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24) 在 com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) 在 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) 在 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63) 在 com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:483)at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33) 在com.sun.proxy。$ Proxy59.execute(未知来源)at com.ge.predix.rmd.siem.boot.PersistenceTest.test_QuerySparkOnReport_GIACOMO_LogDao(PersistenceTest.java:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:483)at org.junit.runners.model.FrameworkMethod $ 1.runReflectiveCall(FrameworkMethod.java:50) 在 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) 在 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) 在 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) 在 org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:73) 在org.springframework.test.context.junit4.statements
也试过
SELECT * from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID ALLOW FILTERING
也试过
SELECT * from siem.report R JOIN siem.netstat N on R.REPORTUUID = N.NETSTATREPORTUUID ALLOW FILTERING
有人能帮帮我吗?我真的使用SparkSQL或CQL吗?
我试过
public void test_JOIN_on_Cassandra () {
SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new SQLContext(sc);
try {
//QueryExecution test1 = sqlContext.executeSql("SELECT * from siem.report");
//QueryExecution test2 = sqlContext.executeSql("SELECT * from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID");
QueryExecution test3 = sqlContext.executeSql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");
} catch (Exception e) {
e.printStackTrace();
}
// SchemaRDD results = sc.sql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");
}
我得到了
==解析的逻辑计划=='项目[unresolvedalias()] + - '加入内部,一些((' siem.report.REPORTUUID =' siem.netstat.NETSTATREPORTUUID)): - ' UnresolvedRelation
siem
。report
,无+ - ' UnresolvedRelationsiem
。netstat
,无 ==分析逻辑计划== org.apache.spark.sql.catalyst.analysis.UnresolvedException:无效 在未解析的对象上调用toAttribute,tree:unresolvedalias() '项目[unresolvedalias(*)] + - '加入内部,一些((' siem.report.REPORTUUID =' siem.netstat.NETSTATREPORTUUID)): - ' UnresolvedRelationsiem
。report
,无+ - ' UnresolvedRelationsiem
。netstat
,无 ==优化的逻辑计划== org.apache.spark.sql.AnalysisException:找不到表:siem
。report
; ==物理计划== org.apache.spark.sql.AnalysisException:找不到表:siem
。report
;
答案 0 :(得分:2)
看起来你在这里混合了一些创建错误的概念。您正在创建的会话将打开Cassandra的直接行,这意味着它将接受CQL而不是SQL。如果要运行SQL,可以进行一些小改动
SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
SchemaRDD results = sparkContext.sql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");
您可以从Spark Context调用SparkSQL,而不是直接连接到Cassandra。更多信息:http://docs.datastax.com/en/latest-dse/datastax_enterprise/spark/sparkSqlJava.html