我有一个文本文件,其中文件内容在开头有空格分隔符。 如下所示:
此模式再次以随机方式重复到文件末尾,如下文中的文本文件所示。
我想从文本文件中读取这些行并将这些行保存为pattern:
文本文件结构是(用#表示空格):
ABC
##EFG"123"
####<HIJK> 22: test file
######LMNOP "Test"
######sssstt"123"
QRS
##TU"223"
####<www> 32: test2 file
######yz test1
####<www> 88: test3 file
######rreeeww
######oooiiiii
##PP
##ss
####<qqq> 89: test6 file
######hhhhggg
######bbbbaaa
######cccczzz
######uu test3
预期输出图像:
我是Perl的新手,我知道如何打开文件并通读行,但我不了解如何在CSV列中存储这种结构。
my $file = 'C:\\outputfile.txt';
open(my $fh, '<:encoding(UTF-8)', $file) or die "Could not open file '$file' $!";
while (my $row = <$fh>) { # reading each row till end of file
chomp $row;
//what should be done here ?
}
请帮忙。
答案 0 :(得分:1)
如果您对代码有疑问,我会说:是的,我可以回答,但这不是好的或是Perl代码的最佳示例。快写。
my $previous_count = "-1"; #beginning, we will think, that no spaces.
my $current_count = "0"; #current default value
my $maximum_count = 3; #u say so
my $to_written = "";
my $delimiter_between_columns = ",";
my $newline_separator = ";";
my $symbol_at_the_beginning = "#"; #input any symbol. But I suppose, you want "\s" <- whitespace' symbol class. input it like this: $var = "\s";
my @aggregate_array_of_ports=();
while(my $row = <DATA>){
#ok, read.
chomp($row);
#print "row is : $row\n";
if($row =~ m/^([$symbol_at_the_beginning]*)/){
#print length($1);
$current_count = length($1) / 2; #take number of spaces divided by 2
$row =~ s/^[$symbol_at_the_beginning]+//;
#hint here, we can get counts as 0,1,2,3 <-see?
#if you take first and third word, you need to add 2 separators.
#OR if you take count with LESSER then previous count, it mean, that you need output
#print"prev : $previous_count and curr : $current_count\n ";
#print"I will write: $to_written\n";
#print "\n PREV: $previous_count --> CURR: $current_count \n";
if($previous_count>=$current_count){
#output here
print "$to_written".$newline_separator."\n";
$previous_count = 0;
$to_written = "";
}
$previous_count = 0 if($previous_count==-1);
#print "$delimiter_between_columns x($current_count-$previous_count)\n";
#print "current: $current_count previous: $previous_count \n";
$to_written .= $delimiter_between_columns x ($current_count - $previous_count + (($current_count-$previous_count)==3?2:0) )."$row";
if ($current_count==($maximum_count-1)){
#print "I input this!: $to_written\n";
$to_written = prepare_to_input_four_spaces($to_written, $delimiter_between_columns);
}
$previous_count = $current_count;
#print"\n";
}
}
#print "$to_written".$newline_separator."\n";
sub prepare_to_input_four_spaces{
my $str = shift; #take string
my $delim = shift;
if ($str=~ m/(.+?[>])\s+(\d+)[:]\s+(.+?)$/){
#here I want to find first capture group before [>] (also it includes) |(.+?[>])|
#next, some spaces |\s+| and I want to catch port |(\d+)|.
#next, |[:]| symbol and some spaces again |\s+| before the tail of the string.
#and will catch this tail: |(.+?)$|.
#where $ mean the right "border" of the string (really - end of the string)
$str = $1.$delim.$2.$delim.$3;
}
return $str;
}
=pod
__DATA__
ABC
EFG"123"
HIJK (12345)
LMNOP "Test"
sssstt"123"
QRS
TU"223"
vwx"55"
www"88"
yz:test1
__END__
=cut
__DATA__
ABC
##EFG"123"
####<HIJK> 22: test file
######LMNOP "Test"
######sssstt"123"
QRS
##TU"223"
####<www> 32: test2 file
######yz test1
####<www> 88: test3 file
######rreeeww
######oooiiiii
##PP
##ss
####<qqq> 89: test6 file
######hhhhggg
######bbbbaaa
######cccczzz
######uu test3
答案 1 :(得分:0)
可能这对你没问题: 我只是跳过了标题并将分隔符设为“|”。你可以改变它。
> perl -lne 'if(/^[^\#]/){if($.!=1){print "$a"};$a=$_;}else{s/^#*//g;$a.="|$_";}END{print $a}' temp
ABC|EFG"123"|HIJK (12345)|LMNOP "Test"|sssstt"123"
QRS|TU"223"|vwx"55"|www"88"|yz:test1