Question

我有一个文本文件，其中文件内容在开头有空格分隔符。如下所示：

第一行开头没有任何空格。
第二行有2个空格。
第三行开头有4个空格。
第四行开头有6个空格。

此模式再次以随机方式重复到文件末尾，如下文中的文本文件所示。

我想从文本文件中读取这些行并将这些行保存为pattern：

第一列没有空格。
在第二列中有2个空格。
第三列中有4个空格。
CSV文件第四列中的6个空格。

文本文件结构是（用＃表示空格）：

ABC
##EFG"123"
####<HIJK> 22: test file
######LMNOP "Test"
######sssstt"123"
QRS
##TU"223"
####<www> 32: test2 file
######yz test1
####<www> 88: test3 file
######rreeeww
######oooiiiii
##PP
##ss
####<qqq> 89: test6 file
######hhhhggg
######bbbbaaa
######cccczzz
######uu test3

预期输出图像： output

我是Perl的新手，我知道如何打开文件并通读行，但我不了解如何在CSV列中存储这种结构。

my $file = 'C:\\outputfile.txt';
open(my $fh, '<:encoding(UTF-8)', $file) or die "Could not open file '$file' $!"; 
while (my $row = <$fh>) { # reading each row till end of file
    chomp $row;  
    //what should be done here ?
}

请帮忙。

Answer 1

如果您对代码有疑问，我会说：是的，我可以回答，但这不是好的或是Perl代码的最佳示例。快写。

my $previous_count = "-1"; #beginning, we will think, that no spaces.
my $current_count = "0";  #current default value
my $maximum_count = 3; #u say so
my $to_written = "";
my $delimiter_between_columns = ",";
my $newline_separator = ";";
my $symbol_at_the_beginning = "#"; #input any symbol. But I suppose, you want "\s" <- whitespace' symbol class. input it like this: $var = "\s";
my @aggregate_array_of_ports=();
while(my $row = <DATA>){
#ok, read.    
chomp($row);    
#print "row is : $row\n";
if($row =~ m/^([$symbol_at_the_beginning]*)/){
   #print length($1);
   $current_count = length($1) / 2; #take number of spaces divided by 2
   $row =~ s/^[$symbol_at_the_beginning]+//;
   #hint here, we can get counts as 0,1,2,3 <-see? 
   #if you take first and third word, you need to add 2 separators.
   #OR if you take count with LESSER then previous count, it mean, that you need output       
   #print"prev : $previous_count   and    curr : $current_count\n  ";
   #print"I will write: $to_written\n";
   #print "\n PREV: $previous_count --> CURR: $current_count \n";
   if($previous_count>=$current_count){
      #output here
      print "$to_written".$newline_separator."\n";
      $previous_count = 0;
      $to_written = "";
   }
   $previous_count = 0 if($previous_count==-1);   

   #print "$delimiter_between_columns x($current_count-$previous_count)\n";
   #print "current: $current_count previous: $previous_count \n";
   $to_written .= $delimiter_between_columns x ($current_count - $previous_count + (($current_count-$previous_count)==3?2:0) )."$row";
   if ($current_count==($maximum_count-1)){
    #print "I input this!: $to_written\n";
    $to_written = prepare_to_input_four_spaces($to_written, $delimiter_between_columns);    
    }
   $previous_count = $current_count;
   #print"\n";
}
}
  #print "$to_written".$newline_separator."\n";
sub prepare_to_input_four_spaces{
    my $str = shift; #take string
    my $delim = shift;
    if ($str=~ m/(.+?[>])\s+(\d+)[:]\s+(.+?)$/){
    #here I want to find first capture group before [>] (also it includes) |(.+?[>])|
    #next, some spaces |\s+| and I want to catch port |(\d+)|.
    #next, |[:]| symbol and some spaces again |\s+| before the tail of the string.
    #and will catch this tail: |(.+?)$|.
    #where $ mean the right "border" of the string (really - end of the string)
        $str = $1.$delim.$2.$delim.$3;
        }
        return $str;
    }

=pod
__DATA__
ABC
  EFG"123"
    HIJK (12345)
      LMNOP "Test"
      sssstt"123"
QRS
  TU"223"
    vwx"55"
    www"88"
      yz:test1
__END__
=cut
__DATA__
ABC
##EFG"123"
####<HIJK> 22: test file
######LMNOP "Test"
######sssstt"123"
QRS
##TU"223"
####<www> 32: test2 file
######yz test1
####<www> 88: test3 file
######rreeeww
######oooiiiii
##PP
##ss
####<qqq> 89: test6 file
######hhhhggg
######bbbbaaa
######cccczzz
######uu test3

Answer 2

可能这对你没问题：我只是跳过了标题并将分隔符设为“|”。你可以改变它。

> perl -lne 'if(/^[^\#]/){if($.!=1){print "$a"};$a=$_;}else{s/^#*//g;$a.="|$_";}END{print $a}' temp
ABC|EFG"123"|HIJK (12345)|LMNOP "Test"|sssstt"123"
QRS|TU"223"|vwx"55"|www"88"|yz:test1

读取每行文本文件并使用Perl存储到CSV文件中的列中

2 个答案: