按升序对字段排序并删除第一个和最后一个数字

时间:2014-04-21 15:11:39

标签: python awk

我的数据如下:

a     10,5,3,66,50
b     2,10,1,88,5,8,9
c     4,60,10,39,55,22
d     1,604,3,503,235,45,60,7
e     20,59,33,2,6,45,36,34,22

我想按升序排序第二列中的数据

a     3,5,10,50,66
b     1,2,5,8,9,10,88
c     4,10,22,39,55,60
....
....

然后从中删除最小值和最大值。像这样:

a     5,10,50
b     2,5,8,9,10
c     10,22,39,55
....
....

任何帮助将不胜感激!

4 个答案:

答案 0 :(得分:2)

你走了:

awk '{l=split($2,a,",");asort(a);printf "%s\t",$1;for(i=2;i<l;i++) printf "%s"(i==l-1?RS:","),a[i]}' t
a       5,10,50
b       2,5,8,9,10
c       10,22,39,55
d       3,7,45,60,235,503
e       6,20,22,33,34,36,45

PS如果我没记错,由于gnu awk

,您需要asort

工作原理:

awk '
    {l=split($2,a,",")                      # Split the data into array "a" and set "l" to length of array
    asort(a)                                # Sort the array "a"
    printf "%s\t",$1                        # Print the first column
    for(i=2;i<l;i++)                        # Run a loop from second element to second last element in array "a"
        printf "%s"(i==l-1?RS:","),a[i]     # Print the element separated by "," except for last element, print a new line
    }'  file                                # Read the file

答案 1 :(得分:1)

嗯,这是使用perl的替代解决方案:

$ perl -F'\s+|,' -lane '
print $F[0] . "\t" . join "," , splice @{[sort { $a<=>$b } @F[1..$#F]]} , 1, $#F-2' file
a       5,10,50
b       2,5,8,9,10
c       10,22,39,55
d       3,7,45,60,235,503
e       6,20,22,33,34,36,45

或使用较新版本的perl,您可以删除@{..}并说:

perl -F'\s+|,' -lane '
    print $F[0] . "\t" . join "," , splice [sort { $a<=>$b } @F[1..$#F]] , 1, $#F-2
' file

或只使用子脚本:

perl -F'\s+|,' -lane '
    print $F[0] . "\t" . join "," , ( sort { $a<=>$b }@F[1..$#F] ) [1..$#F-2]
' file

答案 2 :(得分:0)

的Python:

with open('the_file.txt', 'r') as fin, open('result.txt', 'w') as fout:
    for line in fin:
        f0, f1 = line.split() 
        fout.write('%s\t%s\n' % (f0, ','.join(sorted(f1.split(','), key=int)[1:-1])))

循环体可以解压缩为:

        f0, f1 = line.split()           # split fields on whitespace
        items = f1.split(',')           # split second field on commas
        items = sorted(items, key=int)  # or items.sort(key=int) # sorts items as int
        items = items[1:-1]             # get rid of first and last items
        f1 = ','.join(items)            # reassemble field as csv
        line = '%s\t%s\n' % (f0, f1)    # reassemble line
        fout.write(line)                # write it out

答案 3 :(得分:0)

完整的python示例。这假设您的数据位于文本文件中。你会这样称呼它。

./parser.py filename

或者你可以像这样管道它:

echo 'a    3,2,1,4,5' | ./parser.py -

代码:

#!/bin/env python
import argparse
import sys

def splitAndTrim(d):
    line = str.split(d)
    arr = sorted(map(int, line[1].split(',')))
    print("{0}    {1}".format(line[0], ",".join(map(str, arr[1:-1]))))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('FILE', type=argparse.FileType('r'), default=sys.stdin)
    args = parser.parse_args(sys.argv[1:])
    for line in args.FILE:
        splitAndTrim(line)