从文件BASH排序列

时间:2015-04-22 06:40:42

标签: bash shell sorting

我有以下shell脚本,它从命令行输入的文件中读取数据。该文件是一个数字矩阵,我需要按列分隔文件,然后对列进行排序。现在我可以读取文件并输出各列,但我对如何排序感到迷茫。我输入了一个排序语句,但它只排序第一列。

修改 我已经决定采用另一种方法并实际转置矩阵以将列转换为行。因为我必须稍后计算平均值和中位数,并且已经在脚本中早先成功地为行文件执行此操作 - 建议我尝试"旋转"如果要将列转换为行,则使用矩阵。

这是我的更新代码

     declare -a col=( )
     read -a line < "$1"
     numCols=${#line[@]}                          # save number of columns

     index=0
     while read -a line ; do
     for (( colCount=0; colCount<${#line[@]}; colCount++ )); do
      col[$index]=${line[$colCount]}
      ((index++))
     done
     done < "$1"

     for (( width = 0; width < numCols; width++ )); do
      for (( colCount = width; colCount < ${#col[@]}; colCount += numCols )    ); do

       printf "%s\t" ${col[$colCount]}
     done
    printf "\n"
   done

这给了我以下输出:

    1 9 6 3 3 6
    1 3 7 6 4 4
    1 4 8 8 2 4
    1 5 9 9 1 7
    1 5 7 1 4 7

虽然我现在正在寻找:

    1 3 3 6 6 9
    1 3 4 4 6 7
    1 2 4 4 8 8
    1 1 5 7 9 9
    1 1 4 5 7 7

为了尝试对数据进行排序,我尝试了以下方法:

    sortCol=${col[$colCount]}
    eval col[$colCount]='($(sort <<<"${'$sortCol'[*]}"))'

另外:(这是我从行读入后对行进行排序的方式)

    sortCol=( $(printf '%s\t' "${col[$colCount]}" | sort -n) )

如果您能对此提供任何见解,我们将不胜感激!

4 个答案:

答案 0 :(得分:1)

注意,正如评论中所提到的,纯粹的bash解决方案并不漂亮。有很多方法可以做到,但这可能是最直接的。以下内容要求将每行的所有值读入数组,并保存矩阵stride,以便将其转换为将所有列值读入行矩阵并进行排序。所有排序的列都插入到新的行矩阵a2中。转置该行矩阵会以列排序顺序返回原始矩阵。

注意这适用于文件中任何列列的矩阵。

#!/bin/bash

test -z "$1" && {           ## validate number of input
    printf "insufficient input. usage:  %s <filename>\n" "${0//*\//}"
    exit 1;
}

test -r "$1" || {           ## validate file was readable
    printf "error: file not readable '%s'. usage:  %s <filename>\n" "$1" "${0//*\//}"
    exit 1;
}

## function: my sort integer array - accepts array and returns sorted array
## Usage: array=( "$(msia ${array[@]})" )
msia() {
    local a=( "$@" )
    local sz=${#a[@]}
    local _tmp
    [[ $sz -lt 2 ]] && { echo "Warning: array not passed to fxn 'msia'"; return 1; }
    for((i=0;i<$sz;i++)); do
        for((j=$((sz-1));j>i;j--)); do
        [[ ${a[$i]} -gt ${a[$j]} ]] && {
            _tmp=${a[$i]}
            a[$i]=${a[$j]}
            a[$j]=$_tmp
        }
        done
    done
    echo ${a[@]}
    unset _tmp
    unset sz
    return 0
}

declare -a a1               ## declare arrays and matrix variables
declare -a a2
declare -i cnt=0
declare -i stride=0
declare -i sz=0

while read line; do         ## read all lines into array
    a1+=( $line );
    (( cnt == 0 )) && stride=${#a1[@]}  ## calculate matrix stride
    (( cnt++ ))
done < "$1"

sz=${#a1[@]}                ## calculate matrix size
                            ## print original array
printf "\noriginal array:\n\n"
for ((i = 0; i < sz; i += stride)); do
    for ((j = 0; j < stride; j++)); do
        printf " %s" ${a1[i+j]}
    done
    printf "\n"
done

                            ## sort columns from stride array
for ((j = 0; j < stride; j++)); do
    for ((i = 0; i < sz; i += stride)); do
        arow+=( ${a1[i+j]} )
    done
    a2+=( $(msia ${arow[@]}) )  ## create sorted array
    unset arow
done
                            ## print the sorted array
printf "\nsorted array:\n\n"
for ((j = 0; j < cnt; j++)); do
    for ((i = 0; i < sz; i += cnt)); do
        printf " %s" ${a2[i+j]}
    done
    printf "\n"
done

exit 0

<强>输出

$ bash sort_cols2.sh dat/matrix.txt

original array:

 1 1 1 1 1
 9 3 4 5 5
 6 7 8 9 7
 3 6 8 9 1
 3 4 2 1 4
 6 4 4 7 7

sorted array:

 1 1 1 1 1
 3 3 2 1 1
 3 4 4 5 4
 6 4 4 7 5
 6 6 8 9 7
 9 7 8 9 7

答案 1 :(得分:0)

Awk脚本

awk '
{for(i=1;i<=NF;i++)a[i]=a[i]" "$i}      #Add to column array
END{
        for(i=1;i<=NF;i++){
                split(a[i],b)          #Split column
                x=asort(b)             #sort column
                for(j=1;j<=x;j++){     #loop through sort
                        d[j]=d[j](d[j]~/./?" ":"")b[j]  #Recreate lines
                }
        }
for(i=1;i<=NR;i++)print d[i]          #Print lines
}' file

输出

1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7

答案 2 :(得分:0)

这是我参加这个小练习的内容。应该处理任意数量的列。我认为它们是空格分开的:

#!/bin/bash

linenumber=0
while read line; do
        i=0
        # Create an array for each column.
        for number in $line; do
                [ $linenumber == 0 ] && eval "array$i=()"
                eval "array$i+=($number)"
                (( i++ ))
        done    
        (( linenumber++ ))
done <$1
IFS=$'\n'
# Sort each column
for j in $(seq 0 $i ); do
        thisarray=array$j
        eval array$j='($(sort <<<"${'$thisarray'[*]}"))'
done    
# Print each array's 0'th entry, then 1, then 2, etc...
for k in $(seq 0 ${#array0[@]}); do
        for j in $(seq 0 $i ); do
                eval 'printf ${array'$j'['$k']}" "'
        done    
        echo "" 
done

答案 3 :(得分:0)

不是bash,但我认为这个python代码值得一看,展示如何使用内置函数实现此任务。

来自interpreter

$ cat matrix.txt 
1 1 1 1 1
9 3 4 5 5
6 7 8 9 7
3 6 8 9 1
3 4 2 1 4
6 4 4 7 7

$ python
Python 2.7.3 (default, Jun 19 2012, 17:11:17) 
[GCC 4.4.3] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> f = open('./matrix.txt')
>>> for row in zip(*[sorted(list(a)) 
               for a in zip(*[a.split() for a in f.readlines()])]):
...    print ' '.join(row)
... 
1 1 1 1 1
3 3 2 1 1
3 4 4 5 4
6 4 4 7 5
6 6 8 9 7
9 7 8 9 7