使用grep和awk将数据从.srt传输到.csv / xls

时间:2015-08-21 04:25:15

标签: shell csv awk grep srt

我有一个有趣的项目要做!我正在考虑将srt文件转换为csv / xls文件。

一个srt文件看起来像这样:

1
00:00:00,104 --> 00:00:02,669
Hi, I'm shell-scripting.

2
00:00:02,982 --> 00:00:04,965
I'm not sure if it would work,
but I'll try it!

3
00:00:05,085 --> 00:00:07,321
There must be a way to do it!

虽然我想将它输出到像这样的csv文件中:

"1","00:00:00,104","00:00:02,669","Hi, I'm shell-scripting."   
"2","00:00:02,982","00:00:04,965","I'm not sure if it would work"
,,,"but I'll try it!"
"3","00:00:05,085","00:00:07,321","There must be a way to do it!"

正如您所看到的,每个字幕占用两行。我的想法是使用grep将srt数据放入xls,然后使用awk格式化xls文件。

你们觉得怎么样?我怎么想写呢?我试过了

$grep filename.srt > filename.xls

似乎包括时间码和字幕词在内的所有数据都在xls文件的A列中结束......但我希望这些词在B列中...... awk如何能够提供帮助格式化?

提前谢谢! :)

4 个答案:

答案 0 :(得分:4)

$ cat tst.awk
BEGIN { RS=""; FS="\n"; OFS=","; q="\""; s=q OFS q }
{
    split($2,a,/ .* /)
    print q $1 s a[1] s a[2] s $3 q
    for (i=4;i<=NF;i++) {
        print "", "", "", q $i q
    }
}

$ awk -f tst.awk file
"1","00:00:00,104","00:00:02,669","Hi, I'm shell-scripting."
"2","00:00:02,982","00:00:04,965","I'm not sure if it would work,"
,,,"but I'll try it!"
"3","00:00:05,085","00:00:07,321","There must be a way to do it!"

答案 1 :(得分:1)

我觉得这样的事情应该做得很好:

awk -v RS= -F'\n' '
   { 
      sub(" --> ","\x7c",$2)                 # change "-->" to "|"
      printf "%s|%s|%s\n",$1,$2,$3           # print scene, time start, time stop, description
      for(i=4;i<=NF;i++)printf "|||%s\n",$i  # print remaining lines of description
   }' file.srt

-v RS=将记录分隔符设置为空行。 -F'\n'将字段分隔符设置为新行。

sub()取代&#34; - &gt;&#34;使用管道符号(|)。

然后用管道分开打印前三个字段,然后有一个小循环打印剩余的描述行,由三个管道符号插入以使它们对齐。

<强>输出

1|00:00:00,104|00:00:02,669|Hi, I'm shell-scripting.
2|00:00:02,982|00:00:04,965|I'm not sure if it would work,
|||but I'll try it!
3|00:00:05,085|00:00:07,321|There must be a way to do it!

由于我觉得在Perl和Excel上有更多的乐趣,我采用了上面的输出并在Perl中解析它并编写了一个真正的Excel XLSX文件。当然,没有必要使用awkPerl,所以理想情况下,我会重新构建awk并将其集成到Perl,因为后者可以编写Excel文件,而前者不能。无论如何这里是Perl。

#!/usr/bin/perl
use strict;
use warnings;

use Excel::Writer::XLSX;
my $DEBUG=0; 
my $workbook  = Excel::Writer::XLSX->new('result.xlsx');
my $worksheet = $workbook->add_worksheet();
my $row=0; 

while(my $line=<>){
   $row++;                                   # move down a line in Excel worksheet
   chomp $line;                              # strip CR
   my @f=split /\|/, $line;                  # split fields of line into array @f[], on pipe symbols (|)
   for(my $j=0;$j<scalar @f;$j++){           # loop through all fields
     my $cell= chr(65+$j) . $row;            # calcuate Excell cell, starting at A1 (65="A")
     $worksheet->write($cell,$f[$j]);        # write to spreadsheet
     printf "%s:%s ",$cell,$f[$j] if $DEBUG;
   }
   printf "\n" if $DEBUG;
}

$workbook->close;

<强>输出

enter image description here

答案 2 :(得分:1)

我的另一个答案是半awk和一半Perl,但是,鉴于awk无法编写Excel电子表格,而Perl可以,但要求您掌握{{1}似乎很愚蠢当awk完全能够独立完成所有操作时,{}} Perl所以这里有Perl:

Perl

将上述内容保存在名为#!/usr/bin/perl use strict; use warnings; use Excel::Writer::XLSX; my $workbook = Excel::Writer::XLSX->new('result.xlsx'); my $worksheet = $workbook->add_worksheet(); my $ExcelRow=0; local $/ = ""; # set paragraph mode, so we read till next blank line as one record while(my $para=<>){ $ExcelRow++; # move down a line in Excel worksheet chomp $para; # strip CR my @lines=split /\n/, $para; # split paragraph into lines on linefeed character my $scene = $lines[0]; # pick up scene number from first line of para my ($start,$end)=split / --> /,$lines[1]; # pick up start and end time from second line my $cell=sprintf("A%d",$ExcelRow); # work out cell $worksheet->write($cell,$scene); # write scene to spreadsheet column A $cell=sprintf("B%d",$ExcelRow); # work out cell $worksheet->write($cell,$start); # write start time to spreadsheet column B $cell=sprintf("C%d",$ExcelRow); # work out cell $worksheet->write($cell,$end); # write end time to spreadsheet column C $cell=sprintf("D%d",$ExcelRow); # work out cell $worksheet->write($cell,$lines[2]); # write description to spreadsheet column D for(my $i=3;$i<scalar @lines;$i++){ # output additional lines of description $ExcelRow++; $cell=sprintf("D%d",$ExcelRow); # work out cell $worksheet->write($cell,$lines[$i]); } } $workbook->close; 的文件中,然后使用以下命令使其可执行:

srt2xls

然后你可以用

运行它
chmod +x srt2xls

它将为您提供名为./srt2xls < SomeFileile.srt

的电子表格

enter image description here

答案 3 :(得分:0)

因为您想将srt转换为csv。下面是awk命令

 awk '{gsub(" --> ","\x22,\x22");if(NF!=0){if(j<3)k=k"\x22"$0"\x22,";else{k="\x22"$0"\x22 ";l=1}j=j+1}else j=0;if(j==3){print k;k=""}if(l==1){print ",,,"k ;l=0;k=""}}' inputfile > output.csv

详细介绍了awk

awk '{
       gsub(" --> ","\x22,\x22"); 
       if(NF!=0)
         {
           if(j<3)
              k=k"\x22"$0"\x22,";
           else
            {
              k="\x22"$0"\x22 ";
              l=1
            }
          j=j+1
         }
        else
          j=0;
        if(j==3)
          { 
            print k;
            k=""
          }
        if(l==1)
          {
            print ",,,"k;
            l=0;
            k=""
          }
    }' inputfile > output.csv

在windows平台上获取output.csv,然后使用microsoft excel打开并将其另存为.xls扩展名。