更新

Question

我在我的机器上安装了Cassandra和Spark与SparkSQL。 Spark SQL支持JOIN关键字

https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/spark/sparkSqlSupportedSyntax.html

支持的Spark SQL语法以下语法定义SELECT   查询。

SELECT [DISTINCT] [列名] | [通配符] FROM [kesypace   name。]表名[JOIN子句表名ON连接条件] [WHERE   条件] [GROUP BY列名] [HAVING条件] [ORDER BY列   姓名[ASC | DSC]

我有以下代码

SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
CassandraConnector connector = CassandraConnector.apply(sc.getConf());
Session session = connector.openSession();

ResultSet results;
String sql ="";


BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
sql = "SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID ALLOW FILTERING;";
results = session.execute(sql);

我收到以下错误

引起：com.datastax.driver.core.exceptions.SyntaxError：第1:25行缺少EOF，＆＃39;，＆＃39; （SELECT * from siem.report [，] siem ...）11:14 AM com.datastax.driver.core.exceptions.SyntaxError：行1:25缺少EOF at＆＃39;，＆＃39; （SELECT * from siem.report [，] siem ...）at com.datastax.driver.core.exceptions.SyntaxError.copy（SyntaxError.java:58）在 com.datastax.driver.core.exceptions.SyntaxError.copy（SyntaxError.java:24）在 com.datastax.driver.core.DriverThrowables.propagateCause（DriverThrowables.java:37）在 com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly（DefaultResultSetFuture.java:245）在 com.datastax.driver.core.AbstractSession.execute（AbstractSession.java:63）在 com.datastax.driver.core.AbstractSession.execute（AbstractSession.java:39） at sun.reflect.NativeMethodAccessorImpl.invoke0（Native Method）at sun.reflect.NativeMethodAccessorImpl.invoke（NativeMethodAccessorImpl.java:62）在 sun.reflect.DelegatingMethodAccessorImpl.invoke（DelegatingMethodAccessorImpl.java:43）在java.lang.reflect.Method.invoke（Method.java:483）at com.datastax.spark.connector.cql.SessionProxy.invoke（SessionProxy.scala：33）在com.sun.proxy。$ Proxy59.execute（未知来源）at com.ge.predix.rmd.siem.boot.PersistenceTest.test_QuerySparkOnReport_GIACOMO_LogDao（PersistenceTest.java:178） at sun.reflect.NativeMethodAccessorImpl.invoke0（Native Method）at sun.reflect.NativeMethodAccessorImpl.invoke（NativeMethodAccessorImpl.java:62）在 sun.reflect.DelegatingMethodAccessorImpl.invoke（DelegatingMethodAccessorImpl.java:43）在java.lang.reflect.Method.invoke（Method.java:483）at org.junit.runners.model.FrameworkMethod $ 1.runReflectiveCall（FrameworkMethod.java:50）在 org.junit.internal.runners.model.ReflectiveCallable.run（ReflectiveCallable.java:12）在 org.junit.runners.model.FrameworkMethod.invokeExplosively（FrameworkMethod.java:47）在 org.junit.internal.runners.statements.InvokeMethod.evaluate（InvokeMethod.java:17）在 org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate（RunBeforeTestMethodCallbacks.java:73）在org.springframework.test.context.junit4.statements

也试过

SELECT * from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID ALLOW FILTERING

也试过

SELECT * from siem.report R JOIN siem.netstat N on R.REPORTUUID = N.NETSTATREPORTUUID ALLOW FILTERING

有人能帮帮我吗？我真的使用SparkSQL或CQL吗？

更新

我试过

public void test_JOIN_on_Cassandra () {

        SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
        conf.set("spark.cassandra.connection.host", "localhost");
        JavaSparkContext sc = new JavaSparkContext(conf);


        SQLContext sqlContext = new SQLContext(sc);
        try {
            //QueryExecution test1 = sqlContext.executeSql("SELECT * from siem.report");
            //QueryExecution test2 = sqlContext.executeSql("SELECT * from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID");
            QueryExecution test3 = sqlContext.executeSql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");

        } catch (Exception e) {
            e.printStackTrace();
        }

       // SchemaRDD results = sc.sql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");

}

我得到了

==解析的逻辑计划==＆＃39;项目[unresolvedalias（）] + - ＆＃39;加入内部，一些（（＆＃39; siem.report.REPORTUUID =＆＃39; siem.netstat.NETSTATREPORTUUID））： - ＆＃39; UnresolvedRelation siem。report，无+ - ＆＃39; UnresolvedRelation siem。netstat，无 ==分析逻辑计划== org.apache.spark.sql.catalyst.analysis.UnresolvedException：无效在未解析的对象上调用toAttribute，tree：unresolvedalias（）＆＃39;项目[unresolvedalias（*）] + - ＆＃39;加入内部，一些（（＆＃39; siem.report.REPORTUUID =＆＃39; siem.netstat.NETSTATREPORTUUID））： - ＆＃39; UnresolvedRelation siem。report，无+ - ＆＃39; UnresolvedRelation siem。netstat，无 ==优化的逻辑计划== org.apache.spark.sql.AnalysisException：找不到表：siem。report; ==物理计划== org.apache.spark.sql.AnalysisException：找不到表：siem。report;

Answer 1

看起来你在这里混合了一些创建错误的概念。您正在创建的会话将打开Cassandra的直接行，这意味着它将接受CQL而不是SQL。如果要运行SQL，可以进行一些小改动

SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);

SchemaRDD results = sparkContext.sql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");

您可以从Spark Context调用SparkSQL，而不是直接连接到Cassandra。更多信息：http://docs.datastax.com/en/latest-dse/datastax_enterprise/spark/sparkSqlJava.html

使用SparkSQL在cassandra上连接两个表 - 错误：缺少EOF

更新

1 个答案: