Question

我想在java中编写一个通用的flink作业，它可以接受任何SQL-SELECT查询，针对SQL数据库运行它并将其写入Elasticsearch索引。

我必须解决的一个问题是为JDBC-Connection创建一个DataSource。我想使用JDBCInputFormat。我按照documentation data source中的示例进行了操作。

问题是，必须指定泛型DataSource类型。我只能使用Tuple类型，因为JDBCInputFormat泛型类型OUT扩展了Tuple。但我不知道在编译时我会使用Tuple。

我能解释错误吗？
我可以使用另一个jdbc InputFormat吗？
有没有办法将Tuple指定为通用类型？

我使用java 7和apache-flink 0.10.2

我尝试使用Tuple25只包含字符串，但我得到了一个例外。

下面是代码，然后是异常。

DataSource<StringsTuple25> database = flink.createInput(
JDBCInputFormat.buildJDBCInputFormat()//
  .setDrivername(getDatabaseDriverName())//
  .setDBUrl(getDatabaseUrl())//
  .setUsername(getDatabaseUsername())//
  .setPassword(getDatabasePassword())//
  .setQuery(getQuery())//
  .finish(), 
  StringsTuple25.typeInformation()
);

我的StringTuple25班级

public class StringsTuple25 extends
  Tuple25<String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String> {

  private static final long serialVersionUID = 1L;

  public static TypeInformation<?> typeInformation() {
    TypeInformation<String>[] types = new TypeInformation[25];
    Arrays.fill(types, STRING_TYPE_INFO);
    return new TupleTypeInfo<>(Tuple25.class,types);
  }
}

我得到了这个例外：

Caused by: java.io.IOException: Tuple size does not match columncount
  at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.extractTypes(JDBCInputFormat.java:180)
  at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.nextRecord(JDBCInputFormat.java:162)
  at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.nextRecord(JDBCInputFormat.java:51)
  at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:169)
  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
  at java.lang.Thread.run(Thread.java:745)

Answer 1

如错误所示，您使用的Tuple类型的属性数必须与SQL查询中所选列的数量相匹配。此外，每个属性的数据类型必须匹配。

例如，SELECT id, name FROM ... id INTEGER name且VARCHAR DataStream<Tuple2<Integer,String>>为class MyResultType extends Tuple2<Integer,String>，您可以指定使用DataStream<MyResultType>（或专门针对您的TypeInformation拥有班级Tuple和DataStream<Tuple>）并提供相应的TypeInformation。

您也可以使用通用Tuple t = Tuple.getTupleClass(numberOfAttributes).newInstance(); for(int i = 0; i < numberOfAttributes; i++) { t.setField("", i); } TypeInformation<Tuple> typeInfo = TypeExtractor.getForObject(t);类型。您的流将为classes: () => Classes.find({_id: { $ne: Meteor.userId() }})（未指定属性的数量或类型）。但是，对于Every time I ask you for the box labeled 'classes' I want you to go through the box we called 'Classes' and fill 'classes' with everything you find that doesn't have the '_id' property set to whatever you find when you look inside of the box that 'Meteor.userId()' gives you.，您需要知道属性数量。

Every time I ask you for the box labeled 'classes' I want you to go
through the box we called 'Classes' and fill 'classes' with everything
that you find where the '_id' is set to a certain string that I am passing
you.

因此，您需要从给定定义SQL查询的参数中推断出所选属性的数量。

使用带有可变元组大小的JDBCInputFormat（apache-flink）

1 个答案: