我以多层Yaml格式自定义导出了多个表及其列。 示例提取用伪值修改的
schemas:
- name: exports
tables:
- name: sugar
description: makes stuff sweet
active_date: 2019-01-07 00:00:00
columns:
- name: color
type: abcd
- name: taste
type: abcd
description: xyz
example: 21352352
- name: structure
type: abcd
description: xyzasaa
example: 10001
- name: salt
description: not that sweet.
makes it salty.
active_date: 2018-12-18 00:00:00
columns:
- name: strength
type: abcdef
description: easy to find
example: 2018-03-03 12:30:00
- name: color
type: abcdeffa
description: not sweet
example: 21352352
- name: quality
type: abcd
description: how much is needed
example: 10001
我需要使用一些Serde将数据导入到Hive表中。我熟悉jsonSerde,但不幸的是不支持此格式,因此正在寻找一种替代方法。有人可以建议一种最佳方法吗? regexSerde可以完全帮助我实现的目标吗?
配置单元表数据可以通过以下方式之一表示:
<style type="text/css">
table.tableizer-table {
font-size: 12px;
border: 1px solid #CCC;
font-family: Arial, Helvetica, sans-serif;
}
.tableizer-table td {
padding: 4px;
margin: 3px;
border: 1px solid #CCC;
}
.tableizer-table th {
background-color: #104E8B;
color: #FFF;
font-weight: bold;
}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Level 1(name)</th><th>Level 2(name)</th><th>Level 2 (type)</th><th>Level 2 (description)</th></tr></thead><tbody>
<tr><td>sugar</td><td>color</td><td>abcd</td><td> </td></tr>
<tr><td>sugar</td><td>taste</td><td>abcd</td><td>xyz</td></tr>
<tr><td>sugar</td><td>structure</td><td>abcd</td><td>xyzasaa</td></tr>
<tr><td>salt</td><td>strength</td><td>abcdef</td><td>easy to find</td></tr>
<tr><td>salt</td><td>color</td><td>abcdeffa</td><td>not sweet</td></tr>
<tr><td>salt</td><td>quality</td><td>abcd</td><td>how much is needed</td></tr>
</tbody></table>
---或---
<style type="text/css">
table.tableizer-table {
font-size: 12px;
border: 1px solid #CCC;
font-family: Arial, Helvetica, sans-serif;
}
.tableizer-table td {
padding: 4px;
margin: 3px;
border: 1px solid #CCC;
}
.tableizer-table th {
background-color: #104E8B;
color: #FFF;
font-weight: bold;
}
</style>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>Level 1(name.colum)</th><th>Level 2 (type)</th><th>Level 2 (description)</th></tr></thead><tbody>
<tr><td>sugar.color</td><td>abcd</td><td> </td></tr>
<tr><td>sugar.taste</td><td>abcd</td><td>xyz</td></tr>
<tr><td>sugar.structure</td><td>abcd</td><td>xyzasaa</td></tr>
<tr><td>salt.strength</td><td>abcdef</td><td>easy to find</td></tr>
<tr><td>salt.color</td><td>abcdeffa</td><td>not sweet</td></tr>
<tr><td>salt.quality</td><td>abcd</td><td>how much is needed</td></tr>
</tbody></table>
编辑:使用最简单的方法,我可以在下面进行提取:
$ grep -P '(?<=- name: ).*' export.yaml
- name: exports
- name: sugar
- name: color
- name: taste
- name: structure
- name: salt
- name: strength
- name: color
- name: quality
但是我如何建立缩进关系,所以结果就像:
sugar.color,sugar.taste,sugar.structure
salt.strength,salt.color,salt.quality