请帮我这个小脚本我正在尝试用一个大文件(tabseparated)(mainFileWithValues.txt)中的值来填充一些列,这些列具有以下格式:
print(t_age/int(last_id))
列名在column.nam
中A B C ......... (total 700 columns)
80 2.08 23
14 1.88 30
12 1.81 40
到20 nmes
我首先使用以下方法从大文件中获取列号:
cat columnnam.nam
A
B
.
.
.
然后使用剪切我正在提取值
我做了一个for循环
sed -n "1 s/${i}.*//p" mainFileWithValues.txt | sed 's/[^\t*]//g' |wc -c
我的问题是我希望输出test.txt在主文件列中。 即。
#/bin/bash
for i in `cat columnnam.nam`
do
cut -f`sed -n "1 s/${i}.*//p" mainFileWithValues.txt | sed 's/[^\t*]//g' |wc -c` mainFileWithValues.txt > test.txt
done
cat test.txt
A
80
14
12
B
2.08
1.88
1.81
如何在此脚本中修复此问题?
答案 0 :(得分:4)
以下是单行:
awk 'FNR==NR{h[NR]=$1;next}{for(i=1; i in h; i++){if(FNR==1){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){d[i]=j; break }}}printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"")}print ""}' columns.nam mainfile
<强> 说明: 强>
[注意:不区分大小写的标题匹配,如果您想要严格匹配,请删除tolower()
]
awk '
FNR==NR{ # Here we read columns.nam file
h[NR]=$1; # h -> array, NR -> as array key, $1 -> as array value
next # go to next line
}
{ # Here we read second file
for(i=1; i in h; i++) # iterate array h
{
if(FNR==1) # if we are reading 1st row of second file, will parse header
{
for(j=1; j<=NF; j++) # iterate over fields of 1st row fields
{
# if it was the field we are looking for
if(tolower(h[i])==tolower($j))
{
# then
# d -> array, i -> as array key which is column order number
# j -> as array value which is column number
d[i]=j;
break
}
}
}
# for all records
# if field we searched was found then print such field
# from d[i] we access, column number
printf("%s%s",i>1 ? OFS:"", i in d ? $(d[i]): "");
}
# print newline char
print ""
}
' columns.nam mainfile
测试结果:
$ cat mainfile
A B C
80 2.08 23
14 1.88 30
12 1.81 40
$ cat columns.nam
A
C
$ awk 'FNR==NR{h[NR]=$1;next}{for(i=1; i in h; i++){if(FNR==1){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){d[i]=j; break }}}printf("%s%s",i>1 ? OFS:"", i in d ?$(d[i]):"")}print ""}' columns.nam mainfile
A C
80 23
14 30
12 40
您还可以制作脚本并运行
akshay@db-3325:/tmp$ cat col_parser.awk
FNR == NR {
h[NR] = $1;
next
}
{
for (i = 1; i in h; i++) {
if (FNR == 1) {
for (j = 1; j <= NF; j++) {
if (tolower(h[i]) == tolower($j)) {
d[i] = j;
break
}
}
}
printf("%s%s", i > 1 ? OFS : "", i in d ? $(d[i]) : "");
}
print ""
}
akshay@db-3325:/tmp$ awk -v OFS="\t" -f col_parser.awk columns.nam mainfile
A C
80 23
14 30
12 40
类似的答案
答案 1 :(得分:3)
另一种awk
方法:
awk 'NR == FNR {
hdr[$1]
next
}
FNR == 1 {
for (i=1; i<=NF; i++)
if ($i in hdr)
h[i]
}
{
s=""
for (i in h)
s = s (s == "" ? "" : OFS) $i
print s
}' column.nam mainFileWithValues.txt
A B
80 2.08
14 1.88
12 1.81
要将格式化输出管道上方的命令转到column -t