我有一个看起来像这样的文本文件:
a1: sample1
b1: sample2
c1: sample3
d1: sample4
sample5
sample0
a1: sample_1
b1: sample_2
c1: sample_3
d1: sample_4
sample_5
a1: sample_11
b1: sample_22
c1: sample_33
d1: sample_44
我需要将其转换为可以在Excel中访问的CSV。最终输出应如下所示:
a1, b1, c1, d1
sample1,sample2,sample3,"sample4 sample5"
sample_1,sample_2,sample_3,"sample_4 sample_5"
sample_11,sample_22,sample_33,"sample_44 sample_55"
样本4和sample5和sample0,它们都属于d1,即。在一排。 所以,基本上d1将是一个单元格,它将有三个值,如:
a1 b1 c1 d1 row0
sample1 sample2 sample3 sample4 row1
sample5 row1
sample0 row1
sample_1 sample_2 sample_3 sample_4 row2 sample_5 row2
d1是现在有2个值的一个单元格。
我能够解析文本文件并根据需要获取值。 无法使列d1符合要求。 我怎么能这样做?
需要Perl脚本才能执行此操作吗? 有什么建议吗?
open(file, "f1.txt");
open(csv, ">+f2.csv");
while($line =<file>)
chmop;
if($line =~/a1)
{
@arr1 = split(/:/,$line)
print csv "@arr1[1],";
}
if($line =~/b2)
{
@arr2 = split(/:/,$line)
print csv "@arr2[1],";
}
close(file);
close(csv);
这是我迄今为止的代码。
答案 0 :(得分:0)
假设您在这样的缩放器中拥有该文件的内容:
my $input = "a1: sample1
b1: sample2
c1: sample3
d1: sample4, sample5
a1: sample_1
b1: sample_2
c1: sample_3
d1: sample_4, sample_5
a1: sample_11
b1: sample_22
c1: sample_33
d1: sample_44, sample_55";
然后你可以使用一些正则表达式(当输入类似于你的描述时):
## considering the four lines each time and no empty line as well
$input =~ s/([^\n]+)\n([^\n]+)\n([^\n]+)\n([^\n]+)/"$1","$2","$3","$4"/msg;
## removing a1: things
$input =~ s/[a-z]\d+:\s*//ig;
## removing comma around texts amid of " , "
$input =~ s/(?<!"),(?!")//ig;
## finally output!
print '"a1","b1","c1","d1"'. "\n$input";
答案 1 :(得分:0)
也许以下内容会有所帮助:
use strict;
use warnings;
local ( $/, $" ) = ( '', ',' );
print "a1,b1,c1,d1\n";
while (<>) {
my @fields = map { /:\s+(.+)/; $1 } split /\n/;
print qq/@fields[ 0 .. 2 ],"$fields[3]"\n/;
}
命令行用法:perl script.pl inFile > outFile
数据集输出:
a1,b1,c1,d1
sample1,sample2,sample3,"sample4, sample5"
sample_1,sample_2,sample_3,"sample_4, sample_5"
sample_11,sample_22,sample_33,"sample_44, sample_55"
脚本为段落模式设置$/ = ''
,一次读取一个块。它split
是换行符的块,然后使用正则表达式捕获所需的字段信息。在最后一个字段周围放置双引号,并对数组切片进行插值,由于较早的,
而在字段之间打印$" = ','
。
答案 2 :(得分:0)
这是应该如何:
use strict;
use warnings;
use Data::Dumper;
open(my $TXT, "<", 'inabcd.txt') or die "Cound not open";
open(my $CSV, ">", "outabcd.csv");
my $rowcount = 0;
my %h = ();
while(my $line = <$TXT>) {
if($line =~ /^$/) {
next;
}
chomp($line);
my ($key, @data) = split(':',$line);
if (exists $h{$key}) {
$rowcount = $h{$key}->{'rowcount'};
$rowcount++;
}
$h{$key}->{$rowcount} = \@data;
$h{$key}->{'rowcount'} = $rowcount;
}
my @header = ();
foreach my $el (keys %h) {
if($el ne 'rowcount') {
push(@header, $el);
}
}
my $header = join(',', @header);
print $CSV "$header". "\n";
my $r = 0;
while($r <= $rowcount) {
foreach my $e (@header) {
print("@{$h{$e}->{$r}}" . ",");
print $CSV "@{$h{$e}->{$r}}" . ",";
}
print $CSV "\n";
$r++;
}
close($TXT);
close($CSV);