文本文件以excel文件为大量数据

时间:2018-01-29 05:46:40

标签: perl

我想编写一个脚本,它将从平面文件中获取数据并将其写入excel。我的代码在

之下
#!/usr/bin/perl
use strict;
use warnings;

use Spreadsheet::WriteExcel;
my $workbook  = Spreadsheet::WriteExcel->new( 'deep.xls' );
my $worksheet = $workbook->add_worksheet();

$worksheet->write( 0, 0, "DEEP" ) ;
$worksheet->write( 0, 1, "RIJU" );
$worksheet->write( 1, 0, "Sukhi" );
$worksheet->write( 1, 1, "Abhilash" );

$workbook->close;

我的平面文件包含在下面

FILE_NAME                   Start_Timestamp     End_Timestamp   Record Count    Inbound/Outbound
OmahaTran.txt               1/25/2018 3:40      1/25/2018 3:40  90390           Inbound
concord                     1/24/2018 20:50     1/24/2018 20:50 8631            Inbound
iDine:RewardsNetwork 5220   1/24/2018 12:01     1/24/2018 12:04 218985          Outbound
nashville                   1/25/2018 4:30      1/25/2018 4:32  6810            Inbound
nstrans0.20180125           1/25/2018 2:00      1/25/2018 2:00  124573          Inbound

由于我是perl的新用户,任何人都可以帮我解决如何从文本文件中检索“FILE_NAME”“End_Timestamp”和“Record Count”列并将其写入excel

2 个答案:

答案 0 :(得分:1)

您可以将输入解析为固定宽度的文件。一旦有了字段,你就已经知道如何编写excel了......

<强> parse_fixed.pl

#!/usr/bin/env perl

use warnings;
use strict;

my $usage = "usage: $0 file\n";
my $file = $ARGV[0] or die $usage;
-f $file or die $usage;

# Create $workbook and $worksheet objects here.

open my $fh, "<$file" or die "Unable to open '$file' : $!";
while(my $line = <$fh>) {
    chomp($line);
    # Unpack the fields, first field 27 chars, then 19 chars, etc.
    # perldoc -f pack
    my @fields = unpack("A27 A19 A17 A16 A20", $line);

    # Remove leading and trailing whitespace for each field
    # perldoc -f map
    # perldoc perlretut
    my ($file_name, $start, $stop, $record_count, $direction)
        = map { s|^\s*||; s|\s*||; $_ } @fields; 

    print("filename: '$file_name', start: '$start', stop: '$stop', record_count: '$record_count', direction: '$direction'\n");

    # Add $worksheet->write(...) lines for each field here.

}

# Close $workbook here.

<强>输出

perl parse_fixed.pl input

filename: 'FILE_NAME', start: 'Start_Timestamp', stop: 'End_Timestamp', record_count: 'Record Count', direction: 'Inbound/Outbound'
filename: 'OmahaTran.txt', start: '1/25/2018 3:40', stop: '1/25/2018 3:40', record_count: '90390', direction: 'Inbound'
filename: 'concord', start: '1/24/2018 20:50', stop: '1/24/2018 20:50', record_count: '8631', direction: 'Inbound'
filename: 'iDine:RewardsNetwork 5220', start: '1/24/2018 12:01', stop: '1/24/2018 12:04', record_count: '218985', direction: 'Outbound'
filename: 'nashville', start: '1/25/2018 4:30', stop: '1/25/2018 4:32', record_count: '6810', direction: 'Inbound'
filename: 'nstrans0.20180125', start: '1/25/2018 2:00', stop: '1/25/2018 2:00', record_count: '124573', direction: 'Inbound'

Maven.com dealing with fixed-width records

perldoc -f pack

答案 1 :(得分:1)

这是一种模式,我用它来将固定宽度字段转换为逗号分隔值。当然,Excel会很乐意导入这些CSV数据,为您完成大部分工作

它假定字段从一个标题字符串的开头延伸到下一个标题字符串的开头,并使用内置数组@-来确定每个字符串的开始位置。标题字符串可能包含单个空格;多个连续的空格终止字符串

我希望很明显,$template的值仅打印用于诊断,并且不是CSV数据的一部分

删除print语句是一件简单的事情,如果他们不想要,则会输出以逗号分隔的标题字符串。或者,如果需要,在导入后从电子表格中删除行也是微不足道的

DATA文件句柄用于方便和演示目的。通常,您可能希望open一个特定文件并使用该文件句柄,或者只使用<>来读取指定为命令行参数的文件

use strict;
use warnings 'all';
use feature 'say';

my $head;

my $template = do {

    $head = <DATA>;

    my @template;
    my $prev;

    while ( $head =~ / \S+ (?: [ ] \S+ )* /xg ) {
        push @template, defined $prev ? 'A' . ( $-[0] - $prev ) : '@' . $-[0];
        $prev = $-[0];
    }

    push @template, 'A*';

    "@template";
};

say qq{Pack format "$template"\n};

say join ',', unpack $template, $head;

while ( <DATA> ) {
    say join ',', unpack $template, $_;
}


__DATA__
FILE_NAME                   Start_Timestamp     End_Timestamp   Record Count    Inbound/Outbound
OmahaTran.txt               1/25/2018 3:40      1/25/2018 3:40  90390           Inbound
concord                     1/24/2018 20:50     1/24/2018 20:50 8631            Inbound
iDine:RewardsNetwork 5220   1/24/2018 12:01     1/24/2018 12:04 218985          Outbound
nashville                   1/25/2018 4:30      1/25/2018 4:32  6810            Inbound
nstrans0.20180125           1/25/2018 2:00      1/25/2018 2:00  124573          Inbound

输出

Pack format "@0 A28 A20 A16 A16 A*"

FILE_NAME,Start_Timestamp,End_Timestamp,Record Count,Inbound/Outbound
OmahaTran.txt,1/25/2018 3:40,1/25/2018 3:40,90390,Inbound
concord,1/24/2018 20:50,1/24/2018 20:50,8631,Inbound
iDine:RewardsNetwork 5220,1/24/2018 12:01,1/24/2018 12:04,218985,Outbound
nashville,1/25/2018 4:30,1/25/2018 4:32,6810,Inbound
nstrans0.20180125,1/25/2018 2:00,1/25/2018 2:00,124573,Inbound