在开发代码时,我使用下面的代码片段从BigQuery中读取表数据。
PCollection<ReasonCode> gpseEftReasonCodes = input.
apply("Reading xxyyzz",
BigQueryIO.read(new
ReadTable<ReasonCode>(ReasonCode.class))
.withoutValidation().withTemplateCompatibility()
.fromQuery("Select * from dataset.xxyyzz").usingStandardSql()
.withCoder(SerializableCoder.of(xxyyzz.class))
读取表类:
@DefaultSchema(JavaBeanSchema.class)
public class ReadTable<T> implements SerializableFunction<SchemaAndRecord, T> {
private static final long serialVersionUID = 1L;
private static Gson gson = new Gson();
public static final Logger LOG = LoggerFactory.getLogger(ReadTable.class);
private final Counter countingRecords = Metrics.counter(ReadTable.class,"Reading Records EFT Report");
private Class<T> class1;
public ReadTable(Class<T> class1) {
this.class1 = class1;
}
public T apply(SchemaAndRecord schemaAndRecord) {
Map<String, String> mapping = new HashMap<>();
int counter = 0;
try {
GenericRecord s = schemaAndRecord.getRecord();
org.apache.avro.Schema s1 = s.getSchema();
for (Field f : s1.getFields()) {
counter++;
mapping.put(f.name(), null==s.get(f.name())?null:String.valueOf(s.get(counter)));
}
countingRecords.inc();
JsonElement jsonElement = gson.toJsonTree(mapping);
return gson.fromJson(jsonElement, class1);
}catch(Exception mp) {
LOG.error("Found Wrong Mapping for the Record: "+mapping);
mp.printStackTrace();
return null;
}
}
}
因此,从Bigquery读取数据后,我将数据从SchemaAndRecord映射到pojo,我正在获取下面提到的数据类型为“数值”的列的值。
last_update_amount=java.nio.HeapByteBuffer[pos=0 lim=16 cap=16]
我的期望是,我将获得确切的值,但是获得的HyperByte Buffer是我使用的版本是Apache Beam 2.12.0。 如果需要更多信息,请告诉我。
方法2尝试过:
GenericRecord s = schemaAndRecord.getRecord();
org.apache.avro.Schema s1 = s.getSchema();
for (Field f : s1.getFields()) {
counter++;
mapping.put(f.name(), null==s.get(f.name())?null:String.valueOf(s.get(counter)));
if(f.name().equalsIgnoreCase("reason_code_id")) {
BigDecimal numericValue =
new Conversions.DecimalConversion()
.fromBytes((ByteBuffer)s.get(f.name()) , Schema.create(s1.getType()), s1.getLogicalType());
System.out.println("Numeric Con"+numericValue);
}
else {
System.out.println("Else Condition "+f.name());
}
}
```
Facing Issue:
2019-05-24 (14:10:37) org.apache.avro.AvroRuntimeException: Can't create a: RECORD
答案 0 :(得分:0)
总体方法是正确的。很难弄清楚到底出了什么问题。如果可能,请粘贴完整的堆栈跟踪。另外,看看如何使用BigQueryIO.read()
的示例,它们可能会有所帮助:https://beam.apache.org/releases/javadoc/2.13.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html
您可以使用read()
代替readTableRows()
并获取解析后的值。或按照TableRowParser
实现的示例了解这种解析器如何工作的示例(在readTableRows()
中使用):https://github.com/apache/beam/blob/79d478a83be221461add1501e218b9a4308f9ec8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L449
更新
显然,最近添加了使用Beam模式读取行的功能:https://github.com/apache/beam/pull/8620
您现在应该可以按照以下方式进行操作:
p.apply(BigQueryIO.readTableRowsWithSchema())
.apply(Convert.to(PojoClass.class));