Question

我正在尝试重新排列file1，它已按如下所示按最后一列排序

MEL P 20190731 0453 30.599
PUS P 20190731 0453 30.612
MEA P 20190731 0453 30.620
KDT P 20190731 0453 30.639
PAS P 20190731 0453 30.644
BDT P 20190731 0453 30.900
LAB P 20190731 0453 31.046
KLS P 20190731 0453 31.129
MEL S 20190731 0453 31.222
KDT S 20190731 0453 31.249
PAS S 20190731 0453 31.255
MEA S 20190731 0453 31.258
GRA P 20190731 0453 31.263
BDT S 20190731 0453 31.551
LAB S 20190731 0453 31.630
GRA S 20190731 0453 31.816

插入output中，我希望第一行中包含相同字符串的每一行都沿着行彼此相邻地分组，例如

MEL P 20190731 0453 30.599
MEL S 20190731 0453 31.222
PUS P 20190731 0453 30.612
MEA P 20190731 0453 30.620
MEA S 20190731 0453 31.258
KDT P 20190731 0453 30.639
KDT S 20190731 0453 31.249
PAS P 20190731 0453 30.644
PAS S 20190731 0453 31.255
BDT P 20190731 0453 30.900
BDT S 20190731 0453 31.551
LAB P 20190731 0453 31.046
LAB S 20190731 0453 31.630
KLS P 20190731 0453 31.129
GRA P 20190731 0453 31.263
GRA S 20190731 0453 31.816

同时仍然遵守最后一列的顺序（请注意，例如MEL现在彼此相邻，并且PUS的位置相对于其他位置没有改变）。

我已尝试使用此代码来生成key

awk '!array[$1]++ {print $1}' file1 > key

然后我尝试将其与file1匹配，以便能够使用

对行进行重新排序

grep -Fwf key file > output

但没有任何变化。请帮忙！

Answer 1

awk：

$ awk '{
    if(!($1 in a))           # enumerate all unique $1 for looping in END
        n[++c]=$1
    a[$1]=a[$1] $0 ORS       # append records to hash keyed on $1
}
END {                        # after processing records
    for(i=1;i<=c;i++)        # loop 
        printf "%s",a[n[i]]  # and output
}' file

输出：

MEL P 20190731 0453 30.599
MEL S 20190731 0453 31.222
PUS P 20190731 0453 30.612
MEA P 20190731 0453 30.620
MEA S 20190731 0453 31.258
KDT P 20190731 0453 30.639
KDT S 20190731 0453 31.249
PAS P 20190731 0453 30.644
PAS S 20190731 0453 31.255
BDT P 20190731 0453 30.900
BDT S 20190731 0453 31.551
LAB P 20190731 0453 31.046
LAB S 20190731 0453 31.630
KLS P 20190731 0453 31.129
GRA P 20190731 0453 31.263
GRA S 20190731 0453 31.816

它希望数据在最后一个字段上排序。

Answer 2

使用GNU对-s进行排序：

$ awk '!($1 in a){a[$1]=NR} {print a[$1], $0}' file | sort -s -k1,1n | cut -d' ' -f2-
MEL P 20190731 0453 30.599
MEL S 20190731 0453 31.222
PUS P 20190731 0453 30.612
MEA P 20190731 0453 30.620
MEA S 20190731 0453 31.258
KDT P 20190731 0453 30.639
KDT S 20190731 0453 31.249
PAS P 20190731 0453 30.644
PAS S 20190731 0453 31.255
BDT P 20190731 0453 30.900
BDT S 20190731 0453 31.551
LAB P 20190731 0453 31.046
LAB S 20190731 0453 31.630
KLS P 20190731 0453 31.129
GRA P 20190731 0453 31.263
GRA S 20190731 0453 31.816

任何形式：

$ awk '!($1 in a){a[$1]=NR} {print a[$1], NR, $0}' file | sort -k1,1n -k2,2n | cut -d' ' -f3-
MEL P 20190731 0453 30.599
MEL S 20190731 0453 31.222
PUS P 20190731 0453 30.612
MEA P 20190731 0453 30.620
MEA S 20190731 0453 31.258
KDT P 20190731 0453 30.639
KDT S 20190731 0453 31.249
PAS P 20190731 0453 30.644
PAS S 20190731 0453 31.255
BDT P 20190731 0453 30.900
BDT S 20190731 0453 31.551
LAB P 20190731 0453 31.046
LAB S 20190731 0453 31.630
KLS P 20190731 0453 31.129
GRA P 20190731 0453 31.263
GRA S 20190731 0453 31.816

Answer 3

我相信您正在寻找“稳定排序” [0]。像这样：

 sort -s -k5,5n -k1,1 file1 > output

（或者也可以使用-k键）

https://en.wikipedia.org/wiki/Sorting_algorithm#Stability

从手册页

       -s, --stable
              stabilize sort by disabling last-resort comparison

Answer 4

初学者回答：

cat file1 | sort -s -t' '对我来说（要简单得多）比我要提供的要有意义（但是要简单得多），但是如果您坚持在期望的输出中使用奇怪的排序，那么下面是一个可以执行所需操作的bash脚本。

该策略是根据第一个字段中的内容为每行分配一个递增计数器。如果第一个字段包含与先前行重复的条目，则为先前遇到的重复项分配计数器：

1 MEL P 20190731 0453 30.599
2 PUS P 20190731 0453 30.612
3 MEA P 20190731 0453 30.620
4 KDT P 20190731 0453 30.639
5 PAS P 20190731 0453 30.644
6 BDT P 20190731 0453 30.900
7 LAB P 20190731 0453 31.046
8 KLS P 20190731 0453 31.129
1 MEL S 20190731 0453 31.222
4 KDT S 20190731 0453 31.249
5 PAS S 20190731 0453 31.255
3 MEA S 20190731 0453 31.258
13 GRA P 20190731 0453 31.263
6 BDT S 20190731 0453 31.551
7 LAB S 20190731 0453 31.630
13 GRA S 20190731 0453 31.816

您会看到“ MEL”出现在第1和9行中。因为“ MEL”首先出现，所以递增的计数器值“ 1”同时应用于第1行和第9行。在第4行和第10行，它们共享相同的计数器值（在本例中为4）。此增量计数器由cat，grep，cut和head的混乱和无效使用决定。

然后，根据递增计数器，sort。结果是：

1 MEL P 20190731 0453 30.599
1 MEL S 20190731 0453 31.222
2 PUS P 20190731 0453 30.612
3 MEA P 20190731 0453 30.620
3 MEA S 20190731 0453 31.258
4 KDT P 20190731 0453 30.639
4 KDT S 20190731 0453 31.249
5 PAS P 20190731 0453 30.644
5 PAS S 20190731 0453 31.255
6 BDT P 20190731 0453 30.900
6 BDT S 20190731 0453 31.551
7 LAB P 20190731 0453 31.046
7 LAB S 20190731 0453 31.630
8 KLS P 20190731 0453 31.129
13 GRA P 20190731 0453 31.263
13 GRA S 20190731 0453 31.816

cut，您便有了所需的输出。

这是脚本。以$ /bin/bash stablenosort.sh file1

的身份运行

#!/bin/bash

# Description: Stable sorts (?) by first space-delimited field without
#   sorting by that field.
# Usage: stablenosort.sh [file]
# Ref/attrib:
#  [1]: Trim blank lines: https://stackoverflow.com/a/29549497/10850071

FILEIN="$1"

if [ -f "$FILEIN" ]; then
    LINES="$(cat "$FILEIN")";
else
    exit 1;
fi


while read line; do
    # Generate incrementing label from field1
    FIELD1="$(printf $line | awk '{print $1}' | head -n1)" # get field 1
    INCR_LABEL="$(cat "$FILEIN" | grep "$FIELD1" -n | cut -d':' -f1 | head -n1)" # Assign incrementing labels using FIELD1.
    OUTPUT="$OUTPUT""\n""$INCR_LABEL"" ""$line" # Prepend incrementing label to fields
done <<< "$LINES"

# Sort by incrementing label field then cut incrementing label
OUTPUT=$(printf "${OUTPUT}" | sort -t' ' -g -k1 | cut -d' ' -f2-)
OUTPUT=$(printf "${OUTPUT}" | awk 'NF' - ) # Trim blank lines. See [1].
printf "${OUTPUT}\n" # print final OUTPUT.

重新排列以相同字符串开头的行

4 个答案: