如何用单个空格替换换行符/换行符,但仅限于它们在开始/结束正则表达式内?

时间:2015-06-16 03:37:11

标签: regex linux shell unix replace

我有一个非常大(大小很多GB)的文件,看起来像

[x data1 
 data2 data3
 data4 y]
[a data5 data 6 
 data7  
  data 8 b>
[x data y]
...and so on

如何使用单个空格替换换行符(可能被空格包围),但前提是它们位于[xy]正则表达式部分内?所以,输出应该是

[x data1 data2 data3 data4 y]
[a data5 data 6 
 data7  
  data 8 b>
[x data y]

2 个答案:

答案 0 :(得分:1)

您可以使用awk执行此操作:

awk '/\[x/{f=1} {if(f)printf "%s",$0; else print $0;} /y\]/{print ""; f=0}'

输出:

[x data1  data2 data3 data4 y]
[a data5 data 6 
 data7  
  data 8 b>
[x data y]

您还可以简化为:

awk '/\[x/,/y\]/{ORS=""; if(/y\]/) ORS="\n";}{print}'

输出:

[x data1  data2 data3 data4 y]
[a data5 data 6 
 data7  
  data 8 b>
[x data y]

答案 1 :(得分:0)

虽然这不是使用直接正则表达式,但它是一个逐行运行在输入文件上的perl脚本,并执行OP请求的逻辑。

您可以根据需要修改脚本,以获得稍微不同的逻辑。

my $inFile;
my $inFile = $ARGV[0];

my $fh;
open($fh, $inFile) || die;

my $outFh;
open($outFh, ">out.txt") || die;

my $inStr = 0;
my $finalStringBuf = "";

while (my $row = <$fh>) 
{
    chomp $row;

    if ($row =~ /^\[x/)
    {   
        $row =~ s/^\s+|\s+$//g; # ltrim and rtrim (remove whitespace before and after the string)

        if ($row =~ /y\]$/)
        {
            # if the row ends with a 'y]' also, just print the row
            print $outFh $row . "\n";   #print to output
        }
        else
        {
            # if row starts with '[x'
            $inStr = 1;
            $finalStringBuf = $row; # reset the buffer to a new string
        }
    }
    elsif ($row =~ /y\]$/)
    {
        $row =~ s/^\s+|\s+$//g; # ltrim and rtrim (remove whitespace before and after the string)

        # if row ends with 'y]'
        $inStr = 0;
        $finalStringBuf .= ' ' . $row; # concate the last section of our string with a space between

        $finalStringBuf =~ s/\n//g; # replace \n with a space       
        print $outFh $finalStringBuf . "\n";
    }
    elsif ($inStr == 1)
    {       
        # concate the string to our buffer
        $finalStringBuf .= $row;
    }
    else
    {
        print $outFh $row . "\n";   #print to output
    }
}

close ($fh);
close ($outFh);