如何使用linux regex命令将行乘以范围

时间:2016-02-25 11:12:31

标签: regex linux bash perl csv

我有一个csv文件,其中包含如下所示的行:

"AAAAA","ABC","355 69 2000405 / 2000407"
"BBBBB","ABC","1 87630444120 000 / 005"

我希望得到如下输出:

"AAAAA","ABC","355 69 2000405"
"AAAAA","ABC","355 69 2000406"
"AAAAA","ABC","355 69 2000407"
"BBBBB","ABC","1 87630444120 000"
"BBBBB","ABC","1 87630444120 001"
"BBBBB","ABC","1 87630444120 002"
"BBBBB","ABC","1 87630444120 003" 
"BBBBB","ABC","1 87630444120 004"
"BBBBB","ABC","1 87630444120 005"

如上所述,我希望斜杠前的数字是范围的起点,斜杠后的数字是范围的结尾。另外,我需要重复其他专栏。

我使用Perl尝试了这个,它显示了一些结果,但不是我需要的结果。

任何帮助表示感谢。

2 个答案:

答案 0 :(得分:1)

也许这样的事情。

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
  # Parse the input into three useful bits
  my ($data, $start, $end) = m|(.* )(\d+) / (\d+)|;

  # Use $start and $end to control repetition
  for my $x ($start .. $end) {
    print qq[$data$x"\n];
  }
}

这样称呼:

$ ./this_program your_input_file > some_output_file

答案 1 :(得分:0)

只要数据的输入格式不会变重,这个就可以正常工作。

#!/bin/bash

while read line; do
  number_before_slash="$(echo ${line%%/*} | awk '{print $NF}')"
  number_after_slash="$(echo ${line##*/} | egrep -o '[0-9]*')"
  rest_of_line="$(echo ${line%%/*} | awk '{for (j=1; j<=NF-1; ++j) print $j}')"

  format="printf '${rest_of_line} %0${#number_before_slash}d\n' {$(echo ${number_before_slash})..$(echo ${number_after_slash})}"
  eval ${format}
  echo ""
done < <(cat myfile.csv | awk -F"#" '{print $1}' | egrep -v  '^[[:space:]]*$')

您的数据当然保留在myfile.csv中。此外,它还从输入文件中删除注释和空白行。