我正在使用Apache Spark和Apache Ignite。我有一个使用以下代码在Ignite中编写的spark数据集
dataset.write()
.mode(SaveMode.Overwrite)
.format(FORMAT_IGNITE())
.option(OPTION_CONFIG_FILE(), "ignite-server-config.xml")
.option(OPTION_TABLE(), "CUSTOM_VALUES")
.option(OPTION_CREATE_TABLE_PRIMARY_KEY_FIELDS(), "ID")
.save();
我正在重新阅读它以按分组执行操作,该操作将被推送到Ignite。
Dataset igniteDataset = sparkSession.read()
.format(FORMAT_IGNITE())
.option(OPTION_CONFIG_FILE(), "ignite-server-config.xml")
.option(OPTION_TABLE(), "CUSTOM_VALUES")
.load();
RelationalGroupedDataset idGroupedData = igniteDataset.groupBy(customized_id);
Dataset<Row> result = idGroupedData.agg(count(id).as("count_id"),
count(fid).as("count_custom_field_id"),
count(type).as("count_customized_type"),
count(val).as("count_value"), count(customized_id).as("groupCount"));
现在,我想获取groupby操作返回的行数。因此,我将数据集上的count()称为result.count();
执行此操作时,出现以下异常。
Caused by: org.h2.jdbc.JdbcSQLException: Syntax error in SQL statement "SELECT COUNT(1) AS COUNT FROM (SELECT FROM CUSTOM_VALUES GROUP[*] BY CUSTOMIZED_ID) TABLE1 "; expected "., (, USE, AS, RIGHT, LEFT, FULL, INNER, JOIN, CROSS, NATURAL, ,, SELECT"; SQL statement:
SELECT COUNT(1) AS count FROM (SELECT FROM CUSTOM_VALUES GROUP BY CUSTOMIZED_ID) table1 [42001-197]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
at org.h2.message.DbException.getSyntaxError(DbException.java:217)
其他功能(例如show(), collectAsList().size();
)也可以使用。
我在这里想念什么?
答案 0 :(得分:0)
我针对最新的GridGain社区版本8.7.5测试了您的示例,该版本是Gridgain的开源版本,它基于Ignite 2.7.0来源,并带有其他修复程序(https://www.gridgain.com/resources/download)的子集。
代码如下:
public class Main {
public static void main(String[] args) {
if (args.length < 1)
throw new IllegalArgumentException("You should set the path to client configuration file.");
String configPath = args[0];
SparkSession session = SparkSession.builder()
.enableHiveSupport()
.getOrCreate();
Dataset<Row> igniteDataset = session.read()
.format(IgniteDataFrameSettings.FORMAT_IGNITE()) //Data source
.option(IgniteDataFrameSettings.OPTION_TABLE(), "Person") //Table to read.
.option(IgniteDataFrameSettings.OPTION_CONFIG_FILE(), configPath) //Ignite config.
.load();
RelationalGroupedDataset idGroupedData = igniteDataset.groupBy("CITY_ID");
Dataset<Row> result = idGroupedData.agg(count("id").as("count_id"),
count("city_id").as("count_city_id"),
count("name").as("count_name"),
count("age").as("count_age"),
count("company").as("count_company"));
result.show();
session.close();
}
}
以下是Maven依赖项:
<dependencies>
<dependency>
<groupId>org.gridgain</groupId>
<artifactId>gridgain-core</artifactId>
<version>8.7.5</version>
</dependency>
<dependency>
<groupId>org.gridgain</groupId>
<artifactId>ignite-core</artifactId>
<version>8.7.5</version>
</dependency>
<dependency>
<groupId>org.gridgain</groupId>
<artifactId>ignite-spring</artifactId>
<version>8.7.5</version>
</dependency>
<dependency>
<groupId>org.gridgain</groupId>
<artifactId>ignite-indexing</artifactId>
<version>8.7.5</version>
</dependency>
<dependency>
<groupId>org.gridgain</groupId>
<artifactId>ignite-spark</artifactId>
<version>8.7.5</version>
</dependency>
</dependencies>
这是缓存配置:
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="Person"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="atomicityMode" value="ATOMIC"/>
<property name="sqlSchema" value="PUBLIC"/>
<property name="queryEntities">
<list>
<bean class="org.apache.ignite.cache.QueryEntity">
<property name="keyType" value="PersonKey"/>
<property name="valueType" value="PersonValue"/>
<property name="tableName" value="Person"/>
<property name="keyFields">
<list>
<value>id</value>
<value>city_id</value>
</list>
</property>
<property name="fields">
<map>
<entry key="id" value="java.lang.Integer"/>
<entry key="city_id" value="java.lang.Integer"/>
<entry key="name" value="java.lang.String"/>
<entry key="age" value="java.lang.Integer"/>
<entry key="company" value="java.lang.String"/>
</map>
</property>
<property name="aliases">
<map>
<entry key="id" value="id"/>
<entry key="city_id" value="city_id"/>
<entry key="name" value="name"/>
<entry key="age" value="age"/>
<entry key="company" value="company"/>
</map>
</property>
</bean>
</list>
</property>
</bean>
</list>
</property>
使用仅受ignite-spark依赖关系支持的Spark 2.3.0,我的测试数据有下一个结果:
数据:
ID,CITY_ID,NAME,AGE,COMPANY,
4,1,Justin Bronte,23,bank,
3,1,Helen Richard,49,bank,
结果:
+-------+--------+-------------+----------+---------+-------------+
|CITY_ID|count_id|count_city_id|count_name|count_age|count_company|
+-------+--------+-------------+----------+---------+-------------+
| 1| 2| 2| 2| 2| 2|
+-------+--------+-------------+----------+---------+-------------+
此外,此代码可以完全应用于Ignite 2.7.0。