如何选择特定百分比的行?

时间:2019-03-04 11:02:21

标签: bash awk

早上好!

我有一个140行26列的file.csv。我需要根据第23列中的值对行进行排序。这是一个示例:

Controller1,NA,ASHEBORO,ASH,B,,3674,4572,1814,3674,4572,1814,1859,#NAME?,0,124.45%,49.39%,19%,1,,"Big Risk, No Spare disk",45.04%,4.35%,12.63%,160,464,,,,,,0,1,1,1,0,410,65%,1.1,1.1,1.3,0.65,0.65,0.75,0.04,0.1,,,,,,,,,
Controller2,EU,FR,URG,D,,0,0,0,0,0,0,0,#NAME?,0,#DIV/0!,#DIV/0!,#DIV/0!,1,,#N/A,0.00%,0.00%,#DIV/0!,NO STATS,-1088,,,,,,#N/A,#N/A,#N/A,#N/A,0,#N/A,65%,1.1,1.1,1.3,0.65,0.65,0.75,0.04,0.1,,,,,,,,,
Controller3,EU,FR,URG,D,,0,0,0,0,0,0,0,#NAME?,0,#DIV/0!,#DIV/0!,#DIV/0!,1,,#N/A,0.00%,0.00%,#DIV/0!,NO STATS,-2159,,,,,,#N/A,#N/A,#N/A,#N/A,0,#N/A,65%,1.1,1.1,1.3,0.65,0.65,0.75,0.04,0.1,,,,,,,,,
Controller4,NA,STARR,STA,D,,4430,6440,3736,4430,6440,3736,693,#NAME?,0,145.38%,84.35%,18%,1,,No more Data disk,65.17%,19.18%,-2.18%,849,-96,,,,,,0,2,1,2,2,547,65%,1.1,1.1,1.3,0.65,0.65,0.75,0.04,0.1,,,,,,,,,

要根据第23列的值对行进行排序,请执行以下操作:

awk -F "%*," '$23 > 4' myfikle.csv

结果:

Controller1,NA,ASHEBORO,ASH,B,,3674,4572,1814,3674,4572,1814,1859,#NAME?,0,124.45%,49.39%,19%,1,,"Big Risk, No Spare disk",45.04%,4.35%,12.63%,160,464,,,,,,0,1,1,1,0,410,65%,1.1,1.1,1.3,0.65,0.65,0.75,0.04,0.1,,,,,,,,,
Controller4,NA,STARR,STA,D,,4430,6440,3736,4430,6440,3736,693,#NAME?,0,145.38%,84.35%,18%,1,,No more Data disk,65.17%,19.18%,-2.18%,849,-96,,,,,,0,2,1,2,2,547,65%,1.1,1.1,1.3,0.65,0.65,0.75,0.04,0.1,,,,,,,,,

在我的示例中,我在第23列中使用了4%的值,目标是检索所有以%为单位的值的行,该值在第23列中显着增加。问题是我无法基于因为它仅代表当前表,所以为4%的值。因此,我必须找到另一种方法来检索第23列中具有较高值的​​行。

我必须根据第23列中的百分比对控制器进行降序排序,我更喜欢处理已排序行的前10%,以确保我拥有的控制器百分比很大。

目标是能够根据表中的行数更改百分比。

您对此有什么建议吗?

谢谢! :)

3 个答案:

答案 0 :(得分:1)

如果要使用标准工具,则需要两次读取文件。但是,如果您愿意使用perl,则可以执行以下操作:

perl -e 'my @sorted = sort <>; print @sorted[0..$#sorted * .10]' input-file

答案 1 :(得分:0)

我可能发誓这个问题是重复的,但是到目前为止我找不到类似的问题。

文件是否排序并不重要。您可以从任何文件中使用NUMBER提取head -n NUMBER的第一行。没有内置的方法可以按百分比指定数字,但是您可以计算出PERCENT%的文件行是NUMBER行。

percentualHead() {
  percent="$1"
  file="$2"
  linesTotal="$(wc -l < "$file")"
  (( lines = linesTotal * percent / 100  ))
  head -n "$lines" "$file"
}

或更短但可读性较低

percentualHead() {
  head -n "$(( "$(wc -l < "$2")" * "$1" / 100  ))" "$2"
}

呼叫percentualHead 10 yourFile将打印从yourFile到标准输出的前10%行。

请注意,percentualHead仅适用于文件,因为该文件必须被读取两次。它不适用于FIFO,<()或管道。

答案 2 :(得分:0)

这是GNU awk从文件中获取最高 p %的一种,但它们按照出现的顺序输出:

$ awk -F, -v p=0.5 '               # 50 % of top $23 records
NR==FNR {                          # first run
    a[NR]=$23                      # hash precentages to a, NR as key
    next
}
FNR==1 {                           # second run, at beginning
    n=asorti(a,a,"@val_num_desc")  # sort percentages to descending order
    for(i=1;i<=n*p;i++)            # get only the top p %
        b[a[i]]                    # hash their NRs to b
}
(FNR in b)                         # top p % BUT not in order
' file file | cut -d, -f 23        # file processed twice, cut 23rd for demo
45.04%
19.18%

对此发表评论。