我有一个csv文件,其中包含如下所示的行:
"AAAAA","ABC","355 69 2000405 / 2000407"
"BBBBB","ABC","1 87630444120 000 / 005"
我希望得到如下输出:
"AAAAA","ABC","355 69 2000405"
"AAAAA","ABC","355 69 2000406"
"AAAAA","ABC","355 69 2000407"
"BBBBB","ABC","1 87630444120 000"
"BBBBB","ABC","1 87630444120 001"
"BBBBB","ABC","1 87630444120 002"
"BBBBB","ABC","1 87630444120 003"
"BBBBB","ABC","1 87630444120 004"
"BBBBB","ABC","1 87630444120 005"
如上所述,我希望斜杠前的数字是范围的起点,斜杠后的数字是范围的结尾。另外,我需要重复其他专栏。
我使用Perl尝试了这个,它显示了一些结果,但不是我需要的结果。
任何帮助表示感谢。
答案 0 :(得分:1)
也许这样的事情。
#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
# Parse the input into three useful bits
my ($data, $start, $end) = m|(.* )(\d+) / (\d+)|;
# Use $start and $end to control repetition
for my $x ($start .. $end) {
print qq[$data$x"\n];
}
}
这样称呼:
$ ./this_program your_input_file > some_output_file
答案 1 :(得分:0)
只要数据的输入格式不会变重,这个就可以正常工作。
#!/bin/bash
while read line; do
number_before_slash="$(echo ${line%%/*} | awk '{print $NF}')"
number_after_slash="$(echo ${line##*/} | egrep -o '[0-9]*')"
rest_of_line="$(echo ${line%%/*} | awk '{for (j=1; j<=NF-1; ++j) print $j}')"
format="printf '${rest_of_line} %0${#number_before_slash}d\n' {$(echo ${number_before_slash})..$(echo ${number_after_slash})}"
eval ${format}
echo ""
done < <(cat myfile.csv | awk -F"#" '{print $1}' | egrep -v '^[[:space:]]*$')
您的数据当然保留在myfile.csv中。此外,它还从输入文件中删除注释和空白行。