我想将查询作为参数传递,但它给出了以下错误。
url =' jdbc:cassandra:// localhost:9042 / tutorialspoint'
中选择*
query ='从emp LIMIT 10'
#.option("driver", "com.dbschema.CassandraJdbcDriver")\
df = spark_sql_context.read.format('jdbc')\
.option("driver", "com.dbschema.CassandraJdbcDriver")\
.option("url",url)\
.option("dbtable", query)\
.option("numPartitions", 2) \
.load()
java.sql.SQLException: com.datastax.driver.core.exceptions.SyntaxError: line 1:14 no viable alternative at input 'select' (SELECT * FROM [select]...)
at com.dbschema.CassandraPreparedStatement.executeQuery(CassandraPreparedStatement.java:113)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:62)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:113)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
答案 0 :(得分:1)
根据您的查询,我正在提供此解决方案。它应该适用于几乎所有常见的select
查询(不确定连接)。我正在将您的查询拆分为多个部分,如下所述,因为这是我与Cassandra合作的一种方式。我也改变了使用的驱动程序。希望它对你有用!
column_names = (query.split("SELECT")[1]).split("from")[0].strip().split(",")
print(column_names)
table_name = query.split("from")[1].split(" ")[1]
print(table_name)
if "where" in query:
where_condition = query.split("WHERE")[1]
print(where_condition)
df = self.spark_sql_context.read.format("org.apache.spark.sql.cassandra") \
.load(table=table_name, keyspace=self.__keyspace).select(column_names).where(where_condition)
else:
df = self.spark_sql_context.read.format("org.apache.spark.sql.cassandra") \
.load(table=table_name, keyspace=self.__keyspace).select(column_names)
答案 1 :(得分:0)
如果您已完成Spark和Cassandra集成,则可以像以下一样访问它:
spark_sql_context.read.format("org.apache.spark.sql.cassandra").options(table="tablename", keyspace="keyspace name").load()
<强>更新强>
在Java中,我们可以执行如下特定查询:
public static List<Row> selectSectorHourlyCounterTotals(String sectorName) {
Statement statement =
new SimpleStatement(select * from tablename where sector_name ="'"+sectorName+"' allow filtering");
ResultSet resultSet = dbSession.execute(statement);
return resultSet.all();
}
您需要将其转换为scala / python。