下面的输入有两种记录类型 - 带有线条的固定宽度文件由换行符分隔。以下是样本记录。
记录类型-41:记录长度-629
记录类型-42:记录长度-557
记录类型41和42类似,只缺少3个字段。所以缺少的字段我将它添加为空格。之后,我将使用-TBLPROPERTIES(' serialization.null.format' - >空格)创建hive表,以便它将空格视为hive表中的空值。(如果有更好的方法,则建议处理这个)
在记录类型42中,从第88列到113添加了26个空格,从114列添加到139添加了26个空格,第165列添加了20个空格。使记录长度为629。
我正在尝试将记录类型42作为记录长度为557添加空格,使记录长度为629,相当于记录类型41.这样我就可以将此文件加载到单个hive表中。下面是我使用的命令给出错误。我们可以改进这个命令,使记录类型为42长度,记录类型为41.这是固定长度的文件。
while read line
do
awk '
$2 == "1" {
echo $line >> test_pre.dat
echo "record type: 41";
}
$2 == "2" {
awk 'BEGIN{FS=OFS=""} {$88=" "$88} 1 \
{$114=" "$114} 1 \
{$116=" "$116} 1' test.dat >> test_pre.dat
echo "record type: 42";
}'
done
INPUT:
41310410768228735 354447062622381 0012167121812 110012167121812 110017402445978 06CCF005 61stas-att1.fsabcgroup0-010.ch1il01cvt.ch1il.uvp.els-an.abc.com 60000530400000002998F100F11000000000000000000000000150110192928150110192941150110192949000000080FFFFFFF00000000000000001B702A7C 0000000000000000 FFFFFFFF00 abc:+12167121812@one.abc.com;user=phone abc:+17402445978@one.abc.com;user=phone 000100
42310410755337373 354447061570839 0013133038111 110013133201177 06CCF005 61stas-att1.fsabcgroup0-005.ch1il01cvt.ch1il.uvp.els-an.abc.com 600004C150000000ADE5C100F11000000000000000100000000150110192815150110192822150110192950000000580000000000000000000000001B702BC9 0000000000000000 FFFFFFFF00 abc:+13133201177;oli=63@abcgroup0-001-dtrtmiapca0.cl1oh.uvp.els-tel:+13133038111;npdi 000100
输出:
41310410768228735 354447062622381 0012167121812 110012167121812 110017402445978 06CCF005 61stas-att1.fsabcgroup0-010.ch1il01cvt.ch1il.uvp.els-an.abc.com 60000530400000002998F100F11000000000000000000000000150110192928150110192941150110192949000000080FFFFFFF00000000000000001B702A7C 0000000000000000 FFFFFFFF00 abc:+12167121812@one.abc.com;user=phone abc:+17402445978@one.abc.com;user=phone 000100
42310410755337373 354447061570839 0013133038111 110013133201177 0 6CCF005 61stas-att1.fsabcgroup0-005.ch1il01cvt.ch1il.uvp.els-an.abc.com 600004C150000000ADE5C100F11000000000000000100000000150110192815150110192822150110192950000000580000000000000000000000001B702BC9 0000000000000000 FFFFFFFF00 abc:+13133201177;oli=63@abcgroup0-001-dtrtmiapca0.cl1oh.uvp.els-tel:+13133038111;npdi 000100
答案 0 :(得分:1)
while
循环需要do
和done
个关键字$2
中的awk '
/^.1/ {
print > "test_pre.dat"
print NR ": record type: 41"
}
/^.2/ {
printf("%s%-143s%s\n", substr($0, 1,114), "0", substr($0,114)) > "test_pre.dat"
print NR ": record type: 42"
}
' test.dat
不对输出文件进行硬编码:使用-v
awk -v output_file="$outfile" '
/^.1/ {
print > output_file
...
' "$infile"