我当前的工作文件就是这样
ID Time A_in Time B_in Time C_in
Ax 0.1 10 0.1 15 0.1 45
By 0.2 12 0.2 35 0.2 30
Cz 0.3 20 0.3 20 0.3 15
Fr 0.4 35 0.4 15 0.4 05
Exp 0.5 10 0.5 25 0.5 10
我感兴趣的列是带有"_in"
标题的列。在这些列中,我想从ID为“ Exp”的行元素中减去所有Row元素的值。
让我们考虑一下A_in
列,其中"Exp"
行的值为10。因此,我想从该A_in
列的所有其他元素中减去10
我的业余代码是这样的(我知道这很愚蠢)
#This part is grabbing all the values in ```Exp``` row
Exp=$( awk 'BEGIN{OFS="\t";
PROCINFO["sorted_in"] = "@val_num_asc"}
FNR==1 { for (n=2;n<=NF;n++) { if ($n ~ /_GasOut$/) cols[$n]=n; }}
/Exp/ {
for (c in cols){
shift = $cols[c]
printf shift" "
}
}
' File.txt |paste -sd " ")
Exp_array=($Exp)
z=1
for i in "${Exp_array[@]}"
do
z=$(echo 2+$z | bc -l)
Exp_point=$i
awk -vd="$Exp_point" -vloop="$z" -v '
BEGIN{OFS="\t";
PROCINFO["sorted_in"] = "@val_num_asc"}
function abs(x) {return x<0?-x:x}
FNR==1 { for (n=2;n<=NF;n++) { if ($n ~ /_GasOut$/) cols[$n]=n; }}
NR>2{
$loop=abs($loop-d); print
}
' File.txt
done
我的第一个期望结果是
ID Time A_in Time B_in Time C_in
Ax 0.1 0.0 0.1 10 0.1 35
By 0.2 02 0.2 10 0.2 20
Cz 0.3 10 0.3 05 0.3 05
Fr 0.4 25 0.4 10 0.4 05
Exp 0.5 0.0 0.5 0.0 0.5 0.0
现在从每个"_in"
列中,我想找到2个最小值的对应ID。所以
我的第二个期望结果是
A_in B_in C_in
Ax Cz Cz
By Exp Fr
Exp Exp
答案 0 :(得分:2)
抢救Perl!
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
@ARGV = (@ARGV[0, 0]); # Read the input file twice.
my @header = split ' ', <>;
my @in = grep $header[$_] =~ /_in$/, 0 .. $#header;
$_ = <> until eof;
my @exp = split;
my @min;
<>;
while (<>) {
my @F = split;
for my $i (@in) {
$F[$i] = abs($F[$i] - $exp[$i]);
@{ $min[$i] }[0, 1]
= sort { $a->[0] <=> $b->[0] }
[$F[$i], $F[0]], grep defined, @{ $min[$i] // [] }
unless eof;
}
say join "\t", @F;
}
print "\n";
say join "\t", @header[@in];
for my $index (0, 1) {
for my $i (@in) {
next unless $header[$i] =~ /_in$/;
print $min[$i][$index][1], "\t";
}
print "\n";
}
它将读取文件两次。在第一次读取时,它只将第一行记为@header数组,将最后一行记为@exp数组。
在第二次读取中,它从每个_in列中减去相应的exp值。还将两个最小的数字存储在@min数组中与列位置相对应的位置。
格式化剩下的数字(例如,用0.0
代替0
,用02
代替2
)作为练习给读者。与将输出重定向到几个不同的文件相同。
答案 1 :(得分:1)
经过一两个小时的乐趣,我写了这个可憎的东西:
cat <<EOF >file
ID Time A_in Time B_in Time C_in
Ax 0.1 10 0.1 15 0.1 45
By 0.2 12 0.2 35 0.2 30
Cz 0.3 20 0.3 20 0.3 15
Fr 0.4 35 0.4 15 0.4 05
Exp 0.5 10 0.5 25 0.5 10
EOF
# fix stackoverflow formatting
# input file should be separated with tabs
<file tr -s ' ' | tr ' ' '\t' > file2
mv file2 inputfile
# read headers to an array
IFS=$'\t' read -r -a hdrs < <(head -n1 inputfile)
# exp line read into an array
IFS=$'\t' read -r -a exps < <(grep -m1 $'^Exp\t' inputfile)
# column count
colcnt="${#hdrs[@]}"
if [ "$colcnt" -eq 0 ]; then
echo >&2 "ERROR - must be at least one column"
exit 1
fi
# numbers of those columns which headers have _in suffix
incolnums=$(
paste <(
printf "%s\n" "${hdrs[@]}"
) <(
# puff, the numbers will start from zero cause bash indexes arrays from zero
# but `cut` indexes fields from 1, so.. just keep in mind it's from 0
seq 0 $((colcnt - 1))
) |
grep $'_in\t' |
cut -f2
)
# read the input file
{
# preserve header line
IFS= read -r hdrline
( IFS=$'\t'; printf "%s\n" "$hdrline" )
# ok. read the file field by field
# I think we could awk here
while IFS=$'\t' read -a vals; do
# for each column number with _in suffix
while IFS= read -r incolnum; do
# update the column value
# I use bc for float calculations
vals[$incolnum]=$(bc <<-EOF
define abs(i) {
if (i < 0) return (-i)
return (i)
}
scale=2
abs(${vals[$incolnum]} - ${exps[$incolnum]})
EOF
)
done <<<"$incolnums"
# output the line
( IFS=$'\t'; printf "%s\n" "${vals[*]}" )
done
} < inputfile > MyFirstDesiredOutcomeIsThis.txt
# ok so, first part done
{
# output headers names with _in suffix
printf "%s\n" "${hdrs[@]}" |
grep '_in$' |
tr '\n' '\t' |
# omg, fix tr, so stupid
sed 's/\t$/\n/'
# puff
# output the corresponding ID of 2 smallest values of the specified column number
# @arg: $1 column number
tmpf() {
# remove header line
<MyFirstDesiredOutcomeIsThis.txt tail -n+2 |
# extract only this column
cut -f$(($1 + 1)) |
# unique numeric sort and extract two smallest values
sort -n -u | head -n2 |
# now, well, extract the id's that match the numbers
# append numbers with tab (to match the separator)
# suffix numbers with dollar (to match end of line)
sed 's/^/\t/; s/$/$/;' |
# how good is grep at buffering(!)
grep -f /dev/stdin <(
<MyFirstDesiredOutcomeIsThis.txt tail -n+2 |
cut -f1,$(($1 + 1))
) |
# extract numbers only
cut -f1
}
# the following is something like foldr $'\t' $(tmpf ...) for each $incolnums
# we need to buffer here, we are joining the output column-wise
output=""
while IFS= read -r incolnum; do
output=$(<<<$output paste - <(tmpf "$incolnum"))
done <<<"$incolnums"
# because with start with empty $output, paste inserts leading tabs
# remove them ... and finally output $output
<<<"$output" cut -f2-
} > MySecondDesiredOutcomeIs.txt
# fix formatting to post it on stackoverflow
# files have tabs, and column will output them with space
# which is just enough
echo '==> MyFirstDesiredOutcomeIsThis.txt <=='
column -t -s$'\t' MyFirstDesiredOutcomeIsThis.txt
echo
echo '==> MySecondDesiredOutcomeIs.txt <=='
column -t -s$'\t' MySecondDesiredOutcomeIs.txt
脚本将输出:
==> MyFirstDesiredOutcomeIsThis.txt <==
ID Time A_in Time B_in Time C_in
Ax 0.1 0 0.1 10 0.1 35
By 0.2 2 0.2 10 0.2 20
Cz 0.3 10 0.3 5 0.3 5
Fr 0.4 25 0.4 10 0.4 5
Exp 0.5 0 0.5 0 0.5 0
==> MySecondDesiredOutcomeIs.txt <==
A_in B_in C_in
Ax Cz Cz
By Exp Fr
Exp Exp
在tutorialspoint上进行了编写和测试。
我使用bash和core- / more-utils来操纵文件。首先,我确定后缀为_in
的列数。然后我接受存储在Exp
行中的值。
然后,我仅逐行,逐字段读取文件,并且对于每个具有以后缀_in
结尾的列号的列的字段,我从字段中减去字段值exp
行。我认为这部分应该是最慢的(我使用普通的while IFS=$'\t' read -r -a vals
),但是聪明的awk
脚本可以加快此过程。如您所说,这将生成您的“第一个所需的输出”。
然后,我只需要输出以_in
结尾的标题名称。然后,对于以后缀_in
结尾的每个列号,我需要在该列中标识2个最小值。我使用普通的sort -n -u | head -n2
。然后,这有点棘手。我需要提取此类列中具有相应2个最小值之一的ID。这是grep -f
的工作。我使用sed
在输入中准备适当的正则表达式,然后让grep -f /dev/stdin
进行过滤。
答案 2 :(得分:0)
请一次只问一个问题。这是您要做的第一件事的方法:
$ cat tst.awk
BEGIN { OFS="\t" }
NR==FNR { if ($1=="Exp") split($0,exps); next }
FNR==1 { $1=$1; print; next }
{
for (i=1; i<=NF; i++) {
val = ( (i-1) % 2 ? $i : exps[i] - $i )
printf "%s%s", (val < 0 ? -val : val), (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file file
ID Time A_in Time B_in Time C_in
0 0.1 0 0.1 10 0.1 35
0 0.2 2 0.2 10 0.2 20
0 0.3 10 0.3 5 0.3 5
0 0.4 25 0.4 10 0.4 5
0 0.5 0 0.5 0 0.5 0
在每个UNIX机器上的任何shell中,使用任何awk都可以有效,稳健地运行上述
如果阅读此书后,重新阅读您以前收到的awk答案,然后在awk手册页中查找有关第二个问题的帮助,然后询问一个新的独立版本只是问这个问题。