按包含数字的行排序行,忽略附加到字母的数字

时间:2012-03-23 00:42:29

标签: python ruby perl bash sed

按包含数字的行对行进行排序,忽略附加到字母的数字

我需要对文件中的行进行排序,以便包含至少一个数字(0-9)的行,在这些字母之一(“a”,“e”)之前不包括数字1-5 “g”,“i”,“n”,“o”,“r”,“u”,“v”或“u:”(u + :))被移动到文件的末尾。< / p>

以下是一个示例文件:

I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?

在示例文件中,以下是与哪些匹配的注释:

I want to buy some food. % does not match
I want 3 chickens. % matches
I have no3 basket for the eggs. % does not match, because "3" is preceded by "o"
I have no3 basket which can hold 24 eggs. % matches, because contains "24"
Move the king to A3. % matches, words preceded by "A" are not ignored.
Can you move the king to a6? % matches, 6 is not 1-5

输出会将所有匹配的行放在底部:

I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.

优选地(尽管不是必需的),解决方案将包含最多匹配数字的行分类到末尾。例如。 “我有10只鸡和12只蝙蝠。”在“我有99只鸡”之后出现(4位数)。 (2位数)。

使用BASH,Perl,Python 2.7,Ruby,sedawkgrep的解决方案都很好。

7 个答案:

答案 0 :(得分:5)

如果您的grep支持-P(perl-regexp)选项:

pat='(?<=[^0-9]|^)((?<!u:)(?<![aeginoruv])[1-5]|[06-9])'

{ grep -vP "$pat" input.txt; grep -P "$pat" input.txt; } >output.txt

如果您安装了ssed(super sed):

ssed -nR '
/(?<=[^0-9]|^)((?<!u:)(?<![aeginoruv])[1-5]|[06-9])/{
    H
    $!d
}
$!p
${
    g
    s/\n//
    p
}' input.txt

答案 1 :(得分:3)

在您的数据集上运行此程序时:

#!/usr/bin/env perl    
use strict;
use warnings;

my @moved = ();

my $pat = qr{
      [67890]                   # these big digits anywhere, or else...
    | (?<! [aeginoruv]   )      # none of those letters before
      (?<! u:            )      # nor a "u:" before
      [12345]                   # these little digits
}x;

while (<>) {
    if (/$pat/) {
        push @moved, $_;
    } else {
        print;
    }
}

print @moved;

它产生您想要的输出:

I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?

修改

要合并排序,请将最终打印更改为:

print for sort {
    $a =~ y/0-9// <=> $b =~ y/0-9//
} @moved;

现在输出将是:

I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.

答案 2 :(得分:1)

这听起来像是perl的工作!

说真的,sed很难满足将“u:”移动到文件末尾的要求。 sed真的是基于行的。 awk可以做到,但perl可能更好。

使用\ d +匹配数字

的行

然后使用[aeginorv] \ d +过滤掉你的字母

u:\ d +来处理你的特殊情况u:stuff(你将不得不缓冲它[例如只是在数组中存储匹配的行]所以你可以在最后输出它)

答案 3 :(得分:1)

[编辑,因为其他人都有一个接受文件参数的代码:]

对于Python中的非正则表达式解决方案,

怎么样?
import sys

def keyfunc(s):
    ignores = ("a", "e", "g", "i", "n", "o", "r", "u", "v", "u:")
    return sum(c.isdigit() and not (1 <= int(c) <= 5 and s[:i].endswith(ignores)) 
               for i,c in enumerate(s))

with open(sys.argv[1]) as infile:
    for line in sorted(infile, key=keyfunc):
        print line,

产生:

I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.
I have 99 chickens.
I have 10 chickens and 12 bats.

答案 4 :(得分:1)

use strict;
use v5.10.1;
my @matches;
my @no_matches;
while (my $line = <DATA>) {
    chomp $line;

    if ($line =~ / \d+\W/) {
        #say "MATCH $line"; 
        push @matches, $line;
    }
    elsif ($line =~ /u:[1-5]+\b/) {
        #say "NOMATCH   $line"; 
        push @no_matches, $line;
    }
    elsif ($line =~ /[^aeginoruv][1-5]+\b/) {
        #say "MATCH $line"; 
        push @matches, $line;
    }
    elsif ($line =~ /.[6-90]/) {
        #say "MATCH $line"; 
        push @matches, $line;
    }
    else {
        #say "NOMATCH   $line";
        push @no_matches, $line;
    }
}

foreach (@no_matches){
    say $_;
}
foreach (@matches){
    say $_;
}

__DATA__
I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
I have no3 basket which can hold 24 eggs.
What is u:34?                              <- custom test 
Move the king to A3.
Can you move the king to a6?

提示&GT; perl regex.pl

I want to buy some food.
I have no3 basket for the eggs.
What is u:34?                              <- custom test
I want 3 chickens.
I have no3 basket which can hold 24 eggs.
Move the king to A3.
Can you move the king to a6?

答案 5 :(得分:1)

红宝石

修改:现在包含可选种类)

matches = []
non_matches = []
File.open("lines.txt").each do |line|
  if line.match(/[67890]|(?<![aeginoruv])(?<!u:)[12345]/)
    matches.push line
  else
    non_matches.push line
  end
end
puts non_matches + matches.sort_by{|m| m.scan(/\d/).length}

产生

I want to buy some food.
I want 3 chickens.
I have no3 basket for the eggs.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.

答案 6 :(得分:1)

这可能对您有用:

sed 'h;s/[aeginoruv][1-5]\|u:[1-5]//g;s/[^0-9]//g;s/^$/0/;G;s/\n/\t/' file |
sort -sn |
sed 's/^[^\t]*\t//'
I want to buy some food.
I have no3 basket for the eggs.
I want 3 chickens.
Move the king to A3.
Can you move the king to a6?
I have no3 basket which can hold 24 eggs.

基本上是一步三步:

  1. 创建一个数字键,用于对输出进行排序。不需要排序的行的键为0,其他所有键都是数值。
  2. 按数字键保持顺序-s
  3. 排序
  4. 删除数字键。