Awk/Sed/Perl - If two consecutive columns match two paterns, make a new row

时间:2015-11-12 11:10:44

标签: linux perl if-statement awk sed

I have made a bash script that has been working until today. For some reason, maybe a bug in an update, there is a blank line in the beginning of the data that will not disappear with... sed '/^$/d'. Yet, "tail -n +2" does make it disappear and the script works again. Since the problem has only happened once, I am hesitant to add the tail line to my code because it could erase data if the error doesn't happen again. In short, I am looking for a sanity check.

I can not find anything on the web that has an example of validating the last 2 columns of data as a condition for the start of the next row.

I am posting the problem here, edited and hopefully more clear, as I am not looking for a solution that counts columns and rows, but a solution that will validate the last two columns as being in the format of "$1.00 $44,89987".

That is, the second to last row($NF-1) is in the format of currency with a dollar sign and a decimal, and the last column($NF) is in the format of currency with a dollar sign, no decimal, and sometimes a comma.

The solutions posted before THIS EDIT, DO WORK with the example that was given. Some of the solutions work off the number of four columns as a condition and the assumption that the data is formated correctly from the beginning(without a blank line in the beginning as I pointed out in the first paragraph).

In my script, I have broken up each column as a separate row, and then recombined every 4 rows into one. Similar to some of the below solutions. I do apologize for my lack of communication before this edit, and do appreciate the solutions of those who have posted them.

My original code is a long one liner, with many pipes, that formats a file that has chunks of data in rows (no columns in the start) into the desired format. (It uses a stored bash variable for yesterdays date):

cat BiggestPayouts |perl -lape 's/\s+//sg'|sed 's/"//g'|sed '/^$/d'|awk 'length>2'|awk 'NR%4{printf $0" ";next;}1'|awk -v yest=$yest '{print yest"@"$1"@"$2"@"$3"@"$4}' >> BigPayouts.csv

However, I'm looking for an if-statement to validate the last two columns of data as the condition for the next row, as mentioned.

Perhaps, if someone can point out how to use (awk/sed/perl/regex) into looking for "any two consecutive columns" that are in the format of the currency I described above(dollar sign w/decimal & dollar sign,comma, w/o decimal), I can put all the data into one row and then have the code break the row into new rows every time the condition is found.

Something like:

James Invest $1.00 $26,443 Charles Spent $0.20 $18,119 Sam Expense $0.50 $16,049 James Shared $0.50 $6,373  Charles Gave $1.00 $6,235 Sam Burned $1.00 $5,585     

The outcome should have the last two columns as currency and then a new row. As such...

James Invest $1.00 $26,443 
Charles Spent $0.20 $18,119 
Sam Expense $0.50 $16,049 
James Shared $0.50 $6,373 
Charles Gave $1.00 $6,235 
Sam Burned $1.00 $5,585 

4 个答案:

答案 0 :(得分:3)

It's disgraceful thinking if your first idea is to go straight to asking for help rather than trying to solve a problem yourself. It's also a shame that the contributors on Stack Overflow are encouraged by the offer of experience points to solve your problem for you rather than to help you to find your own solution. However, since there are already several solutions here, I may as well add my own

This program creates a regex pattern $amt that matches a money value, and replaces with a newline any whitespace after two occurrences of an amount and before a non-amount

use strict;
use warnings 'all';
use v5.10;

my $data = do {
    local $/;
    <DATA>;
};

my $amt = qr/\$[\d.,]+/;

$data =~ s/\s+/ /g;
$data =~ s/ $amt \s+ $amt \K \s+ (?= [^\$\s] ) /\n/gx;

say $data;

__DATA__
James Invest $1.00
$26,443 Charles Spent $0.20
$18,119 Sam Expense $0.50
$16,049 James Shared $0.50 $6,373
Charles Gave $1.00
$6,235 Sam Burned $1.00
$5,585

output

James Invest $1.00 $26,443
Charles Spent $0.20 $18,119
Sam Expense $0.50 $16,049
James Shared $0.50 $6,373
Charles Gave $1.00 $6,235
Sam Burned $1.00 $5,585 

Update

Alternatively, if it is just a matter of printing the values four fields at a time, then the solution is much simpler

use strict;
use warnings 'all';
use v5.10;

my @data;
while ( <DATA> ) {
    push @data, split;
    while ( @data >= 4 ) {
        my @row = splice @data, 0, 4;
        print "@row\n";
    }
}

print "@data\n" if @data;

__DATA__
James Invest $1.00
$26,443 Charles Spent $0.20
$18,119 Sam Expense $0.50
$16,049 James Shared $0.50 $6,373
Charles Gave $1.00
$6,235 Sam Burned $1.00
$5,585

The output is identical to that of my original solution

答案 1 :(得分:2)

awk -vRS= '{for(i=1;i<=NF;i++)if(i%4){printf $i" "}else{print $i}}' file

答案 2 :(得分:2)

Perl solution:

perl -lane 'chomp;
            push @B, @F;
            print join " ", splice @B, 0, 4 while @B > 3
           ' input_file
  • chomp removes the trailing newline.
  • the array @B is used as a buffer.
  • -a splits the input on whitespace into the @F array
  • -l adds a newline to print

答案 3 :(得分:1)

Another awk solution, print "\n" each four records

awk -vRS="[ \n]+" '
    NR%4!=1{printf OFS}
    {printf "%s",$0;}
    NR%4==0{printf "\n"}' file

you get,

James Invest $1.00 $26,443
Charles Spent $0.20 $18,119
Sam Expense $0.50 $16,049
James Shared $0.50 $6,373
Charles Gave $1.00 $6,235
Sam Burned $1.00 $5,585