I have made a bash script that has been working until today. For some reason, maybe a bug in an update, there is a blank line in the beginning of the data that will not disappear with... sed '/^$/d'. Yet, "tail -n +2" does make it disappear and the script works again. Since the problem has only happened once, I am hesitant to add the tail line to my code because it could erase data if the error doesn't happen again. In short, I am looking for a sanity check.
I can not find anything on the web that has an example of validating the last 2 columns of data as a condition for the start of the next row.
I am posting the problem here, edited and hopefully more clear, as I am not looking for a solution that counts columns and rows, but a solution that will validate the last two columns as being in the format of "$1.00 $44,89987".
That is, the second to last row($NF-1) is in the format of currency with a dollar sign and a decimal, and the last column($NF) is in the format of currency with a dollar sign, no decimal, and sometimes a comma.
The solutions posted before THIS EDIT, DO WORK with the example that was given. Some of the solutions work off the number of four columns as a condition and the assumption that the data is formated correctly from the beginning(without a blank line in the beginning as I pointed out in the first paragraph).
In my script, I have broken up each column as a separate row, and then recombined every 4 rows into one. Similar to some of the below solutions. I do apologize for my lack of communication before this edit, and do appreciate the solutions of those who have posted them.
My original code is a long one liner, with many pipes, that formats a file that has chunks of data in rows (no columns in the start) into the desired format. (It uses a stored bash variable for yesterdays date):
cat BiggestPayouts |perl -lape 's/\s+//sg'|sed 's/"//g'|sed '/^$/d'|awk 'length>2'|awk 'NR%4{printf $0" ";next;}1'|awk -v yest=$yest '{print yest"@"$1"@"$2"@"$3"@"$4}' >> BigPayouts.csv
However, I'm looking for an if-statement to validate the last two columns of data as the condition for the next row, as mentioned.
Perhaps, if someone can point out how to use (awk/sed/perl/regex) into looking for "any two consecutive columns" that are in the format of the currency I described above(dollar sign w/decimal & dollar sign,comma, w/o decimal), I can put all the data into one row and then have the code break the row into new rows every time the condition is found.
Something like:
James Invest $1.00 $26,443 Charles Spent $0.20 $18,119 Sam Expense $0.50 $16,049 James Shared $0.50 $6,373 Charles Gave $1.00 $6,235 Sam Burned $1.00 $5,585
The outcome should have the last two columns as currency and then a new row. As such...
James Invest $1.00 $26,443
Charles Spent $0.20 $18,119
Sam Expense $0.50 $16,049
James Shared $0.50 $6,373
Charles Gave $1.00 $6,235
Sam Burned $1.00 $5,585
答案 0 :(得分:3)
It's disgraceful thinking if your first idea is to go straight to asking for help rather than trying to solve a problem yourself. It's also a shame that the contributors on Stack Overflow are encouraged by the offer of experience points to solve your problem for you rather than to help you to find your own solution. However, since there are already several solutions here, I may as well add my own
This program creates a regex pattern $amt
that matches a money value, and replaces with a newline any whitespace after two occurrences of an amount and before a non-amount
use strict;
use warnings 'all';
use v5.10;
my $data = do {
local $/;
<DATA>;
};
my $amt = qr/\$[\d.,]+/;
$data =~ s/\s+/ /g;
$data =~ s/ $amt \s+ $amt \K \s+ (?= [^\$\s] ) /\n/gx;
say $data;
__DATA__
James Invest $1.00
$26,443 Charles Spent $0.20
$18,119 Sam Expense $0.50
$16,049 James Shared $0.50 $6,373
Charles Gave $1.00
$6,235 Sam Burned $1.00
$5,585
James Invest $1.00 $26,443
Charles Spent $0.20 $18,119
Sam Expense $0.50 $16,049
James Shared $0.50 $6,373
Charles Gave $1.00 $6,235
Sam Burned $1.00 $5,585
Alternatively, if it is just a matter of printing the values four fields at a time, then the solution is much simpler
use strict;
use warnings 'all';
use v5.10;
my @data;
while ( <DATA> ) {
push @data, split;
while ( @data >= 4 ) {
my @row = splice @data, 0, 4;
print "@row\n";
}
}
print "@data\n" if @data;
__DATA__
James Invest $1.00
$26,443 Charles Spent $0.20
$18,119 Sam Expense $0.50
$16,049 James Shared $0.50 $6,373
Charles Gave $1.00
$6,235 Sam Burned $1.00
$5,585
The output is identical to that of my original solution
答案 1 :(得分:2)
awk -vRS= '{for(i=1;i<=NF;i++)if(i%4){printf $i" "}else{print $i}}' file
答案 2 :(得分:2)
Perl solution:
perl -lane 'chomp;
push @B, @F;
print join " ", splice @B, 0, 4 while @B > 3
' input_file
chomp
removes the trailing newline.@B
is used as a buffer.-a
splits the input on whitespace into the @F
array-l
adds a newline to print
答案 3 :(得分:1)
Another awk
solution, print "\n"
each four records
awk -vRS="[ \n]+" '
NR%4!=1{printf OFS}
{printf "%s",$0;}
NR%4==0{printf "\n"}' file
you get,
James Invest $1.00 $26,443 Charles Spent $0.20 $18,119 Sam Expense $0.50 $16,049 James Shared $0.50 $6,373 Charles Gave $1.00 $6,235 Sam Burned $1.00 $5,585