如果有一个文件,行格式如下:
SOME_ATTRIBUTE_1 XYZ; IMPORTANT_ATTRIBUTE_1 1234; SOME_ATTRIBUTE_2 XYZ; IMPORTANT_ATTRIBUTE_2 AB;
现在我想将其转换为以下形式,即两个重要的属性值产生一个新属性:
JOIN_IMPORTANT_ATTRIBUTE AB1234; SOME_ATTRIBUTE_1 XYZ; IMPORTANT_ATTRIBUTE_1 1234; SOME_ATTRIBUTE_2 XYZ; IMPORTANT_ATTRIBUTE_2 AB;
这可以用一些带有awk或类似的单线程来完成吗?我不知道如何解决这个问题,而没有抓住java技巧框。
答案 0 :(得分:2)
使用awk,您可以在分号+任意数量的空格中拆分输入,并进一步拆分重要字段,如下所示:
awk -F'; *' '{ split($2, a1, / +/); split($4, a2, / +/); print "JOIN_IMPORTANT_ATTRIBUTE", a2[2] a1[2] ";", $0 }' infile
输出:
JOIN_IMPORTANT_ATTRIBUTE AB1234; SOME_ATTRIBUTE_1 XYZ; IMPORTANT_ATTRIBUTE_1 1234; SOME_ATTRIBUTE_2 XYZ; IMPORTANT_ATTRIBUTE_2 AB;
这假设您知道重要属性所在的列。
答案 1 :(得分:1)
Perl解决方案:
perl -lane 'print join " ", "JOIN_IMPORTANT_ATTRIBUTE", substr($F[7], 0, -1) . $F[3], @F'
答案 2 :(得分:1)
awk -F'[; ]+' '{print "JOIN_IMPORTANT_ATTRIBUTE", $8 $4 "; " $0}' file
答案 3 :(得分:1)
这是我的bash + awk替代方案。
cat attrs.awk
# Awk script to get joined attributes for one line of attributes
BEGIN {
RS=";";
PROCINFO["sorted_in"]="@ind_num_asc"; #gawk only: sort attributes on their attr id (so that IMPORTANT_ATTRIBUTE_n comes before IMPORTANT_ATTRIBUTE_n+1
}
$1 ~ /^IMPORTANT_ATTRIBUTE_/ {
attrId=substr($1, 1 + length("IMPORTANT_ATTRIBUTE_"));
if ($2 ~ /^[0-9]/)
impAttrsNum[attrId]=$2;
else
impAttrsAlpha[attrId]=$2;
}
END {
#alpha attribs come before num attribs
for(i in impAttrsAlpha)
alphaVals = alphaVals impAttrsAlpha[i];
for(i in impAttrsNum)
numVals = numVals impAttrsNum[i];
printf("JOIN_IMPORTANT_ATTRIBUTE %s%s%s", alphaVals, numVals, RS);
}
cat joinattrs
#!/bin/bash
#
# Applies joined attributes for each input line
while read l
do
if [[ -n "$l" ]]
then
joinAttrs=$(echo "$l" | awk -f attrs.awk)
echo "$joinAttrs $l"
fi
done
如何使用它: ./joinattrs<数据文件
不是单行:)