我有一个非常大(大小很多GB)的文件,看起来像
[x data1
data2 data3
data4 y]
[a data5 data 6
data7
data 8 b>
[x data y]
...and so on
如何使用单个空格替换换行符(可能被空格包围),但前提是它们位于[x
和y]
正则表达式部分内?所以,输出应该是
[x data1 data2 data3 data4 y]
[a data5 data 6
data7
data 8 b>
[x data y]
答案 0 :(得分:1)
您可以使用awk
执行此操作:
awk '/\[x/{f=1} {if(f)printf "%s",$0; else print $0;} /y\]/{print ""; f=0}'
输出:
[x data1 data2 data3 data4 y]
[a data5 data 6
data7
data 8 b>
[x data y]
您还可以简化为:
awk '/\[x/,/y\]/{ORS=""; if(/y\]/) ORS="\n";}{print}'
输出:
[x data1 data2 data3 data4 y]
[a data5 data 6
data7
data 8 b>
[x data y]
答案 1 :(得分:0)
虽然这不是使用直接正则表达式,但它是一个逐行运行在输入文件上的perl脚本,并执行OP请求的逻辑。
您可以根据需要修改脚本,以获得稍微不同的逻辑。
my $inFile;
my $inFile = $ARGV[0];
my $fh;
open($fh, $inFile) || die;
my $outFh;
open($outFh, ">out.txt") || die;
my $inStr = 0;
my $finalStringBuf = "";
while (my $row = <$fh>)
{
chomp $row;
if ($row =~ /^\[x/)
{
$row =~ s/^\s+|\s+$//g; # ltrim and rtrim (remove whitespace before and after the string)
if ($row =~ /y\]$/)
{
# if the row ends with a 'y]' also, just print the row
print $outFh $row . "\n"; #print to output
}
else
{
# if row starts with '[x'
$inStr = 1;
$finalStringBuf = $row; # reset the buffer to a new string
}
}
elsif ($row =~ /y\]$/)
{
$row =~ s/^\s+|\s+$//g; # ltrim and rtrim (remove whitespace before and after the string)
# if row ends with 'y]'
$inStr = 0;
$finalStringBuf .= ' ' . $row; # concate the last section of our string with a space between
$finalStringBuf =~ s/\n//g; # replace \n with a space
print $outFh $finalStringBuf . "\n";
}
elsif ($inStr == 1)
{
# concate the string to our buffer
$finalStringBuf .= $row;
}
else
{
print $outFh $row . "\n"; #print to output
}
}
close ($fh);
close ($outFh);