我在下面的测试文件中模仿我的问题:
# cat out
2014-01-10 18:23:25 0 Andy/ADPTER/
2014-01-10 18:23:36 503 Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 Jim/ADPTER/UNITS MAP.csv
这是我的Bash变量:
# echo $bucket
bucket_name
因此,在上面的文件中,我希望Bash变量值以第4个字段作为前缀。
这是我想要的输出:
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
这就是我的尝试:
# awk -v var=$bucket '{$4=var"/"$4; print}' out
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
问题:
我的awk
命令完成了我需要的操作,然而,它会弄乱外场间距(分隔符??)。我的意图是只前缀bucket_name/
到第4个字段并维护输入文件具有的任何间距方案(包括右/左对齐字段)。
这是我的另一次尝试:
# awk -v var=$bucket 'BEGIN{OFS="\t"}{$4=var"/"$4; print}' out
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
但它也没有帮助。
感谢。
答案 0 :(得分:3)
您已在OP中标记了Perl,因此有一个Perl解决方案:
perl -pe'BEGIN{$var=shift}s,(?:.*?\s+){3}\K,$var/,' "$bucket" out
它与使用sed
的{{3}}在技术上是相同的解决方案,但它的好处是避免了逃避问题。 Shell变量$bucket
可以包含任何内容。
答案 1 :(得分:2)
你可以使用sed。
$ bucket='bucket_name'
$ sed "s~^\(\([^[:blank:]]\+[[:blank:]]\+\)\{3\}\)~\1$bucket/~" file
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
[[:blank:]]\+
posix字符类,它匹配任何类型的水平空格字符,一次或多次。 [^[:blank:]]\+
POSIX否定了字符类,它匹配任何字符但不是空格一次或多次。
答案 2 :(得分:2)
您可以使用此awk
:
bucket="bucket_name"
awk --re-interval -v b="$bucket" '{sub(/([^[:blank:]]+[[:blank:]]+){3}/,
"&" b "/")} 1' file
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
-v b="$bucket" # pass a value to awk in variable b
--re-interval # Enable the use of interval
# expressions in regular expression matching
sub # match input using regex and substitute with
# the given string
([^[:blank:]]+[[:blank:]]+){3} # match first 3 fields of the line separated by space/tab
"&" b "/" # replace by matched string + var b + /
编辑:(感谢@EdMorton)要使其适用于参数中的任何值(例如,如果bucket="&"
尝试两种解决方案),请使用:
awk --re-interval -v b="$bucket" 'match($0, /([^[:blank:]]+[[:blank:]]+){3}/) {
$0 = substr($0, 1, RLENGTH) b "/" substr($0, RLENGTH+1) } 1' file
答案 3 :(得分:1)
这在awk中有点棘手,但是有一个相关的GNU扩展:在gawk中,split
函数采用可选的第四个参数来保存实际的字段分隔符供以后使用。使用它:
gawk -v bucket="$bucket" '{ split($0, f, FS, d); d[NF] = ORS; f[4] = bucket "/" f[4]; for(i = 1; i <= NF; ++i) printf("%s%s", f[i], d[i]); }' filename
那是:
{
split($0, f, FS, d) # split line into fields, saving fields in
# the f and delimiters in the d array
d[NF] = ORS # for the newline at the end
f[4] = bucket "/" f[4] # fix fourth field
for(i = 1; i <= NF; ++i) { # then print the fields separated by the
printf("%s%s", f[i], d[i]); # saved delimiters
}
}
附录:除非变量来自可信赖的来源并且保证不包含元字符,否则我不能真的建议使用sed执行此操作(否则您 会出现代码注入问题)。那说:sed的简单方法是
sed "s|[[:space:]]\+|&${bucket}/|3" filename
...将${bucket}
追加到[[:space:]]\+
的第三次出现。
答案 4 :(得分:1)
如果您要坚持使用awk,那么显式提供格式字符串可能最简单:
awk '{printf "%s %s %10s %s/%s\n", $1, $2, $3, b, $4}' b="$bucket" out