我一直在玩awk和sed。我有一个格式如下的文件
0000098236|Q1.1|one|Q2.1|one|Q3.1|one
0000027965|Q1.5|five|Q1.1|one|Q2.1|one
0000083783|Q1.1|one|Q1.5|five|Q2.1|one
0000027965|Q1.1|one|Q1.1|one|Q1.5|five
0000083983|Q1.1|one|Q1.5|five|Q2.1|one
0000083993|Q1.3|three|Q1.4|four|Q1.2|two
我想将QX.X转换为特定的数值。我用sed完成了这个:
sed -e "s/\<Q1.1\>/88/g" |
sed -e "s/Q1.2/89/g" |
sed -e "s/Q1.3/90/g" |
sed -e "s/Q1.4/91/g" |
sed -e "s/Q1.5/92/g" |
等等等,到目前为止一切顺利。在我这样做后,我得到了
0000098236|88|one|88|one|88|one
0000027965|92|five|88|one|88|one
0000083783|88|one|92|five|88|one
0000027965|88|one|88|one|92|five
0000083983|88|one|92|five|88|one
0000083993|90|three|91|four|89|two
分隔符是管道。现在我需要删除重复对
88|one
是一对因此,上面的文件在运行转换后应该看起来像下面那样
0000098236|88|one
0000027965|95|five|88|one
0000083783|88|one|92|five
0000027965|88|one|88|one
0000083983|88|one|92|five
0000083993|90|three|91|four|89|two
我尝试使用awk和数组但无法使其工作。
答案 0 :(得分:2)
sed -r ':a s#([0-9]+\|[a-z]+)(.*)\1#\1\2#; ta; s#\|\|+#|#g; s#\|$##' FILE
0000098236|88|one
0000027965|92|five|88|one
0000083783|88|one|92|five
0000027965|88|one|92|five
0000083983|88|one|92|five
0000083993|90|three|91|four|89|two
答案 1 :(得分:2)
这消除了预处理的需要。它假定小数点后的数字对于选择替换是重要的。
awk '
BEGIN {
r = "88 89 90 91 92";
split(r, rep);
FS = OFS = "|"
}
{
delete seen;
cf = i = 2;
while (cf < NF) {
split($cf, a, ".");
newval = rep[a[2]];
if (!seen[newval]) {
$i = newval;
$(i + 1) = $(cf + 1)
seen[newval] = 1;
nf = i + 1;
i += 2;
};
cf += 2
};
NF = nf;
print
}' inputfile
答案 2 :(得分:1)
TXR:
@(do (defun rem-dupes (pairs : recur)
(if (null pairs)
nil
(let ((front (first pairs))
(tail (rem-dupes (rest pairs) t)))
(if (memqual front tail)
(if recur
(remqual front tail)
(cons front (remqual front tail)))
(cons (first pairs) tail))))))
@(collect :vars nil)
@(freeform 1)
@id|@(coll)@left|@right@/[|\n]/@(end)
@(bind pairs @(rem-dupes [mapcar list left right]))
@(set left @[mapcar first pairs])
@(set right @[mapcar second pairs])
@(output)
@id@(rep)|@left|@right@(end)
@(end)
@(end)
执行命令
$ txr data.txr data.txt
0000098236|88|one
0000027965|92|five
0000083783|88|one|92|five
0000027965|88|one|92|five
0000083983|88|one|92|five
0000083993|90|three|91|four|89|two
答案 3 :(得分:0)
这可能对您有用:
sed ':a;s/\(\([0-9]*|[^|]*\).*\)|\2/\1/;ta' file
0000098236|88|one
0000027965|92|five|88|one
0000083783|88|one|92|five
0000027965|88|one|92|five
0000083983|88|one|92|five
0000083993|90|three|91|four|89|two
事实上,所有文件处理都可以使用sed:
的一个实例来实现cat <<\! >file.sed
> 1{x;s/$/.1|88.2|89.3|90.4|91.5|91/;x} # stuff lookup into hold space .key|value
> s/|Q[^.]*/|/g # guessing here - remove Q and number prefix
> :a;s/\(\(\.[^|]*|[^|]*\).*\)|\2/\1/;ta # remove duplicate fields
> G # append newline and lookup table
> :b;s/\(\.[^|]*\)\(.*\n.*\)\1|\([^.]*\)/\3\2/;tb # replace key with value from lookup
> s/\n.*// # remove lookup table
> !
sed -f file.sed original_file
0000098236|88|one
0000027965|91|five|88|one
0000083783|88|one|91|five
0000027965|88|one|91|five
0000083983|88|one|91|five
0000083993|90|three|91|four|89|two