Question

我们有一个文件 - input.txt - 5个9cols的矩阵（实际文件是10K +行和40K + cols）：

Col1    Col2    Col3    Col4    Col5    Col6    Col7    Col8    Col9
0.8 1   0.8 0.6 0.9 0.4 0.3 0.1 0.6
1   0.6 0.5 0.6 0.3 0.1 0.2 0.5 0.2
0.4 0.5 0.1 0.7 0.8 0.8 0.6 0.3 0.3
0.9 0.2 1   0.1 0.9 0.8 0.6 0.9 0.2
0.9 1   0.2 0.5 0.5 0.7 0.5 0.3 0.2

注1：文件没有标题 - 请将其保留在此处以供参考 注2：解决方案必须扩展到40K +列的实际数据 注3：添加了python和perl标记，无论哪种表现都更好。

需要将其转换为output.txt以下 - 矩阵5个3cols：

Col1    Col2    Col3
2.6 1.7 1.3
1.6 0.5 0.9
0.7 2.4 0.9
2.2 2.5 1.3
1.4 1.9 0.7

逻辑：

Output_Col1 = (Input_Col2) + (Input_Col3*2)
Output_Col2 = (Input_Col5) + (Input_Col6*2)
Output_Col3 = (Input_Col8) + (Input_Col9*2)

努力，尝试将二三两列的矩阵文件作为单独的文件，只要我可以将第二个文件乘以2然后对这两个文件求和......可能有一种更简单的方法。

ncol=9
cut -d" " -f`seq -s "," 2 3 $ncol` input.txt > col2s.txt
cut -d" " -f`seq -s "," 3 3 $ncol` input.txt > col3s.txt

Answer 1

Perl救援：

perl -lane 'print join "\t", $F[1] + $F[2] * 2, $F[4] + $F[5] * 2, $F[7] + $F[8] * 2' input.txt > output.txt

说明：

-l在每个print
-a将每一行拆分为@F数组
-n逐行读取输入并运行每个

如果要处理更多相邻列，可以使用更短的表示法：

print join "\t", map $F[$_] + $F[$_ + 1] * 2, 1, 4, 7

（将1, 4, 7替换为左列的实际列表。）

Answer 2

awk中

awk '{print ($2+($3*2)),($5+($6*2)),($8+($9*2))}'

在第一个记录上，打印前三个字段。然后只是打印出你要求的等式

可扩展版

awk '{for(i=2;i<=NF;i+=3)x=(x?x FS:"")($i+($(i+1)*2));print x;x=y}' file

输出：

2.6 1.7 1.3
1.6 0.5 0.9
0.7 2.4 0.9
2.2 2.5 1.3
1.4 1.9 0.7

Answer 3

使用bc和循环：

（编辑：摆脱UUOC）

while read c1 c2 c3 c4 c5 c6 c7 c8 c9 c_others; do
        out1=$(echo $c2 + $c3 \* 2 | bc)
        out2=$(echo $c5 + $c6 \* 2 | bc)
        out3=$(echo $c8 + $c9 \* 2 | bc)
        echo ${out1} ${out2} ${out3}
done < input.txt

结果：

2.6 1.7 1.3
1.6 .5 .9
.7 2.4 .9
2.2 2.5 1.3
1.4 1.9 .7

当你想要一个0代表浮点数＆lt; 1，你可以解析输出：

| sed -e 's/ \./ 0./g' -e 's/^\./0./'

我担心额外的sed解析会使这成为最慢的解决方案。

Answer 4

考虑到文件的大小，我的偏好是使用Python，Perl中的特定程序甚至是C / C ++等编译语言来处理它。这可能比shell脚本快得多，并且具有更好的错误处理能力。

使用shell可能会使用以下内容：

# Outer loop deal with each line in the file.
cat my_file | while read line
do
    # Inner loop. Deal with each calculation on the line.
    while [[ ${line} ]]
    do
        echo ${line} | cut -d' ' -f1-3  | nawk '{printf("%d\t",$2+($3*2))}'
        line=$(echo ${line} | cut -s -d' ' -f4-)
    done
    printf "\n"
done

在文件中乘以和求和值

4 个答案: