使用带有可变元组大小的JDBCInputFormat(apache-flink)

时间:2016-02-24 17:03:14

标签: java apache-flink

我想在java中编写一个通用的flink作业,它可以接受任何SQL-SELECT查询,针对SQL数据库运行它并将其写入Elasticsearch索引。

我必须解决的一个问题是为JDBC-Connection创建一个DataSource。我想使用JDBCInputFormat。我按照documentation data source中的示例进行了操作。

问题是,必须指定泛型DataSource类型。我只能使用Tuple类型,因为JDBCInputFormat泛型类型OUT扩展了Tuple。但我不知道在编译时我会使用Tuple

  1. 我能解释错误吗?
  2. 我可以使用另一个jdbc InputFormat吗?
  3. 有没有办法将Tuple指定为通用类型?
  4. 我使用java 7和apache-flink 0.10.2

    我尝试使用Tuple25只包含字符串,但我得到了一个例外。

    下面是代码,然后是异常。

    DataSource<StringsTuple25> database = flink.createInput(
    JDBCInputFormat.buildJDBCInputFormat()//
      .setDrivername(getDatabaseDriverName())//
      .setDBUrl(getDatabaseUrl())//
      .setUsername(getDatabaseUsername())//
      .setPassword(getDatabasePassword())//
      .setQuery(getQuery())//
      .finish(), 
      StringsTuple25.typeInformation()
    );
    

    我的StringTuple25班级

    public class StringsTuple25 extends
      Tuple25<String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String, String> {
    
      private static final long serialVersionUID = 1L;
    
      public static TypeInformation<?> typeInformation() {
        TypeInformation<String>[] types = new TypeInformation[25];
        Arrays.fill(types, STRING_TYPE_INFO);
        return new TupleTypeInfo<>(Tuple25.class,types);
      }
    }
    

    我得到了这个例外:

    Caused by: java.io.IOException: Tuple size does not match columncount
      at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.extractTypes(JDBCInputFormat.java:180)
      at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.nextRecord(JDBCInputFormat.java:162)
      at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.nextRecord(JDBCInputFormat.java:51)
      at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:169)
      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
      at java.lang.Thread.run(Thread.java:745)
    

1 个答案:

答案 0 :(得分:2)

如错误所示,您使用的Tuple类型的属性数必须与SQL查询中所选列的数量相匹配。此外,每个属性的数据类型必须匹配。

例如,SELECT id, name FROM ... id INTEGER nameVARCHAR DataStream<Tuple2<Integer,String>>class MyResultType extends Tuple2<Integer,String>,您可以指定使用DataStream<MyResultType>(或专门针对您的TypeInformation拥有班级TupleDataStream<Tuple>)并提供相应的TypeInformation

您也可以使用通用Tuple t = Tuple.getTupleClass(numberOfAttributes).newInstance(); for(int i = 0; i < numberOfAttributes; i++) { t.setField("", i); } TypeInformation<Tuple> typeInfo = TypeExtractor.getForObject(t); 类型。您的流将为classes: () => Classes.find({_id: { $ne: Meteor.userId() }})(未指定属性的数量或类型)。但是,对于Every time I ask you for the box labeled 'classes' I want you to go through the box we called 'Classes' and fill 'classes' with everything you find that doesn't have the '_id' property set to whatever you find when you look inside of the box that 'Meteor.userId()' gives you. ,您需要知道属性数量

Every time I ask you for the box labeled 'classes' I want you to go
through the box we called 'Classes' and fill 'classes' with everything
that you find where the '_id' is set to a certain string that I am passing
you. 

因此,您需要从给定定义SQL查询的参数中推断出所选属性的数量。