我有一个559行的文本文件,我只需要将文件的特定部分中的标签按最长到最短的字符串排序。我正在考虑使用sort
,但我确实没有要使用的分隔符,我正在尝试确定使用标记-k
的开始和结束。
以下是我的文字文件示例:
^(.*a)$0UMYBPEB(.*)$1$|\0ybpeb\1
^(.*a)$0UMYBPUK(.*)$1$|\0yuk \1
^(.*a)$0UMYBPUKE(.*)$1$|\0yuke \1
^(.*a)$0USAAHPERD(.*)$1$|\0aahpe\1
^(.*a)$0USAASC(.*)$1$|\0aasc \1
^(.*a)$0USAATF(.*)$1$|\0aatf \1
^(.*a)$0USABARIS(.*)$1$|\0abar \1
^(.*a)$0USABOR(.*)$1$|\0abor \1
^(.*a)$0USACA(.*)$1$|\0aca \1
^(.*a)$0USACI(.*)$1$|\0aci \1
^(.*a)$0USACMLA(.*)$1$|\0acmla\1
^(.*a)$0USACSANZ(.*)$1$|\0acsan\1
^(.*a)$0USACTA(.*)$1$|\0acta \1
^(.*a)$0USACTACLASS(.*)$1$|\0cass \1
^(.*a)$0USAD(.*)$1$|\0adbus\1
^(.*a)$0USADAMMATTHEW(.*)$1$|\0adam \1
^(.*a)$0USAEA(.*)$1$|\0aea \1
^(.*a)$0USAFAS(.*)$1$|\0afas \1
^(.*a)$0USAFRICAN(.*)$1$|\0afric\1
^(.*a)$0USAGI(.*)$1$|\0agi \1
^(.*a)$0USAGO(.*)$1$|\0ago \1
请注意,我所引用的标签位于$
(.*)
之后
我想要的结果是最短的标签:
^(.*a)$0USADAMMATTHEW(.*)$1$|\0adam \1
^(.*a)$0USACTACLASS(.*)$1$|\0cass \1
^(.*a)$0USAFRICAN(.*)$1$|\0afric\1
^(.*a)$0USAAHPERD(.*)$1$|\0aahpe\1
^(.*a)$0USACSANZ(.*)$1$|\0acsan\1
^(.*a)$0UMYBPUKE(.*)$1$|\0yuke \1
^(.*a)$0USABARIS(.*)$1$|\0abar \1
^(.*a)$0USACMLA(.*)$1$|\0acmla\1
^(.*a)$0UMYBPEB(.*)$1$|\0ybpeb\1
^(.*a)$0UMYBPUK(.*)$1$|\0yuk \1
^(.*a)$0USAFAS(.*)$1$|\0afas \1
^(.*a)$0USAASC(.*)$1$|\0aasc \1
^(.*a)$0USAATF(.*)$1$|\0aatf \1
^(.*a)$0USABOR(.*)$1$|\0abor \1
^(.*a)$0USACTA(.*)$1$|\0acta \1
^(.*a)$0USACA(.*)$1$|\0aca \1
^(.*a)$0USACI(.*)$1$|\0aci \1
^(.*a)$0USAEA(.*)$1$|\0aea \1
^(.*a)$0USAGI(.*)$1$|\0agi \1
^(.*a)$0USAGO(.*)$1$|\0ago \1
^(.*a)$0USAD(.*)$1$|\0adbus\1
答案 0 :(得分:2)
你可以像这样使用perl。
perl -ne 'push @Lines,$_;}{print (sort { length($b) <=> length($a) } @Lines)' file
每行都被读入数组@Lines
。
}{
对文件结尾有特殊含义。
sort { length($b) <=> length($a) } @Lines
使用数组的特殊变量$a
和$b
对数组进行排序。
print
打印已排序的数组。
答案 1 :(得分:2)
awk
(及朋友)拯救
awk '{print length($0) "\t" $0}' file | sort -nr | cut -f2-
^(.*a)-bashUSADAMMATTHEW(.*)$|\0adam \1
^(.*a)-bashUSACTACLASS(.*)$|\0cass \1
^(.*a)-bashUSAFRICAN(.*)$|\0afric\1
^(.*a)-bashUSAAHPERD(.*)$|\0aahpe\1
^(.*a)-bashUSACSANZ(.*)$|\0acsan\1
^(.*a)-bashUSABARIS(.*)$|\0abar \1
^(.*a)-bashUMYBPUKE(.*)$|\0yuke \1
^(.*a)-bashUSACMLA(.*)$|\0acmla\1
^(.*a)-bashUMYBPUK(.*)$|\0yuk \1
^(.*a)-bashUMYBPEB(.*)$|\0ybpeb\1
^(.*a)-bashUSAFAS(.*)$|\0afas \1
^(.*a)-bashUSACTA(.*)$|\0acta \1
^(.*a)-bashUSABOR(.*)$|\0abor \1
^(.*a)-bashUSAATF(.*)$|\0aatf \1
^(.*a)-bashUSAASC(.*)$|\0aasc \1
^(.*a)-bashUSAGO(.*)$|\0ago \1
^(.*a)-bashUSAGI(.*)$|\0agi \1
^(.*a)-bashUSAEA(.*)$|\0aea \1
^(.*a)-bashUSACI(.*)$|\0aci \1
^(.*a)-bashUSACA(.*)$|\0aca \1
^(.*a)-bashUSAD(.*)$|\0adbus\1
答案 2 :(得分:2)
使用单个 gawk (GNU awk):
awk '{a[length,NR]=$0}END{n=asorti(a,dest); for(;n>0;n--) print a[dest[n]]}' file
输出:
^(.*a)$0USADAMMATTHEW(.*)$1$|\0adam \1
^(.*a)$0USACTACLASS(.*)$1$|\0cass \1
^(.*a)$0USAAHPERD(.*)$1$|\0aahpe\1
^(.*a)$0USAFRICAN(.*)$1$|\0afric\1
^(.*a)$0USABARIS(.*)$1$|\0abar \1
^(.*a)$0UMYBPUKE(.*)$1$|\0yuke \1
^(.*a)$0USACSANZ(.*)$1$|\0acsan\1
^(.*a)$0UMYBPUK(.*)$1$|\0yuk \1
^(.*a)$0USACMLA(.*)$1$|\0acmla\1
^(.*a)$0UMYBPEB(.*)$1$|\0ybpeb\1
^(.*a)$0USABOR(.*)$1$|\0abor \1
^(.*a)$0USAATF(.*)$1$|\0aatf \1
^(.*a)$0USAASC(.*)$1$|\0aasc \1
^(.*a)$0USAFAS(.*)$1$|\0afas \1
^(.*a)$0USACTA(.*)$1$|\0acta \1
^(.*a)$0USACA(.*)$1$|\0aca \1
^(.*a)$0USAGO(.*)$1$|\0ago \1
^(.*a)$0USAGI(.*)$1$|\0agi \1
^(.*a)$0USAEA(.*)$1$|\0aea \1
^(.*a)$0USACI(.*)$1$|\0aci \1
^(.*a)$0USAD(.*)$1$|\0adbus\1
length
- 行长
asorti(source [,dest [,how]]) - 对数组索引进行排序(默认按升序排列)
dest
- 排序索引的结果数组
答案 3 :(得分:1)
这是GNU awk中的另一个:
$ gawk '
function cmp_val_len(i1,v1,i2,v2) { # define length comparing function for asort
return(length(v2) - length(v1))
}
{
a[NR]=$0 # hash records to a
}
END {
n=asort(a,b,"cmp_val_len") # sort the records using defined function
for(i=1;i<=n;i++) # loop and
print b[i] # output
}
' file
输出(仅限开始):
^(.*a)$0USADAMMATTHEW(.*)$1$|\0adam \1
^(.*a)$0USACTACLASS(.*)$1$|\0cass \1
^(.*a)$0USAAHPERD(.*)$1$|\0aahpe\1
^(.*a)$0USAFRICAN(.*)$1$|\0afric\1
^(.*a)$0UMYBPUKE(.*)$1$|\0yuke \1
....