如何将此Perl表达式应用于文件的每一行?

时间:2015-08-25 19:07:49

标签: bash perl shell

我正在使用Mac 10.9.5,bash shell和perl 5,版本16,subversion 3(v5.16.3)。我有以下脚本......

#!/bin/bash
perl -pi -e "s/([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?),([^,]+?)/REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), '\$24', '\$26', '\$2', '\$27');/g" $1 

但是,当我针对文件运行脚本时......

 sh myscript.sh ~/Downloads/myfile.csv

以上只针对文件的第一行运行,而不是针对文件中的每一行,尽管文件有数千行......

davea$ wc -l ~/Downloads/myfile.csv
91552 /Users/davea/Downloads/myfile.csv

如何调整以上内容以便将搜索和替换应用于文件的每一行?

编辑:这是我作为输入传入的文件示例

 app.app.first_name,app.app.id,app.app.last_name,app.app.max_time,app.app.url,app.app.user_name,thirdparty.created,thirdparty.district,thirdparty.dob,thirdparty.ell_status,thirdparty.email,thirdparty.frl_status,thirdparty.gender,thirdparty.grade,thirdparty.hispanic_ethnicity,thirdparty.iep_status,thirdparty.last_modified,thirdparty.location.zip,thirdparty.name.first,thirdparty.name.last,thirdparty.name.middle,thirdparty.race,thirdparty.school,thirdparty.sis_id,thirdparty.state_id,thirdparty.student_number,thirdparty.id,matchmaker_result
 FirstName,0040FBA053464647BD51141EECF4437F,LastName,2014-09-15 20:46:11,cityunifiedca.springboardonline.org,mlastname,2014-04-04T23:03:29.916Z,51e76ab1d93412f47b000c32,6/12/2000,,,Paid,F,10,Y,Y,2015-08-19T21:33:13.989Z,90033-1803,FIRSTNAME,LASTNAME,A,Caucasian,51f811478a86244d2900033f,061200F010,6124939964,061200F010,533f3a412a1f1fea24c8e164,match

以下是运行上述

的输出
 REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), 'thirdparty.sis_id', 'thirdparty.student_number', 'app.app.id', 'thirdparty.id');atchmaker_result
 FirstName,0040FBA053464647BD51141EECF4437F,LastName,2014-09-15 20:46:11,cityunifiedca.springboardonline.org,mlastname,2014-04-04T23:03:29.916Z,51e76ab1d93412f47b000c32,6/12/2000,,,Paid,F,10,Y,Y,2015-08-19T21:33:13.989Z,90033-1803,FIRSTNAME,LASTNAME,A,Caucasian,51f811478a86244d2900033f,061200F010,6124939964,061200F010,533f3a412a1f1fea24c8e164,match

3 个答案:

答案 0 :(得分:2)

提供输入文件的路径作为第一个命令行参数。

注意:数组索引可能已关闭,因为我只是将你的正则表达式匹配变量并将它们向下移一(即,我没有测试此代码)。

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new({ binary => 1 }) or die Text::CSV->error_diag;
open(my $fh, '<', $ARGV[0]) or die $!;

while (my $row = $csv->getline($fh)) {
    print "REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), '$row->[23]', '$row->[25]', '$row->[1]', '$row->[26]');\n";
}

$csv->eof or $csv->error_diag;
close($fh);

答案 1 :(得分:1)

让我们首先将脚本修复为Perl脚本,单行代码用于命令行。

#!/usr/bin/perl
# example code from `man perlrun`

use warnings;
use strict;
my $extension = '.orig';
my $oldargv;
my $backup;
my $subre = "([^,]+?)";
my $bigre = "$subre," x 27 . $subre;
my $presub = "REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', '')";
LINE: while (<>) {
if ($ARGV ne $oldargv) {
    if ($extension !~ /\*/) {
      $backup = $ARGV . $extension;
    } else {
      ($backup = $extension) =~ s/\*/$ARGV/g;
    }
    rename($ARGV, $backup);
    open(ARGVOUT, ">$ARGV");
    select(ARGVOUT);
    $oldargv = $ARGV;
}
  s/$bigre/$presub, '\$24', '\$26', '\$2', '\$27');/g;
} continue {
  print;    # this prints to original filename
}
select(STDOUT);

然后,看看那个正则表达式,可能有一行包含,,的空字段,所以...你可以修复正则表达式,但是使用一个是错误的。让我们将这一行从上面改为:

  my @f = split /,/;
  $_ = $presub . ", '${f[23]}', '${f[25]}', '${f[1]}', '${f[26]}');"

这假设没有包含,的字段成为引用字段或转义字段。对于所有你使用Text :: CSV的人,如Matt Jacob所示。我有类似的警告。

或者,如果必须,您可以坚持使用正则表达式,但删除g修饰符,锚定行,并允许空捕获的组。

s/^([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?),([^,]*?)$/REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), '\$24', '\$26', '\$2', '\$27');/;

如果您从替换的引用中删除了mg,那么在regex101.com中这不会超时并且在为示例输入提供标记$时有效捕获的字段。

或修改上面第一个更改这些行的脚本:

my $subre = "([^,]*?)";
my $bigre = '^' . "$subre," x 27 . $subre . '$';
...
s/$bigre/$presub, '\$24', '\$26', '\$2', '\$27');/;

答案 2 :(得分:1)

您的s///似乎只匹配第一行。不知道为什么。然而,这是一个荒谬的正则表达式。您希望将逗号分成列表

perl -F, -lane '
    BEGIN { $t="REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), \047-\047, \047\047), \047%s\047, \047%s\047, \047%s\047, \047%s\047);\n"; }
    printf $t, $F[23], $F[25], $F[1], $F[26];
' file
REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), 'thirdparty.sis_id', 'thirdparty.student_number', 'app.app.id', 'thirdparty.id');
REPLACE INTO student (ID, SIS_ID, STUDENT_NUM, USER_ID, OTHER_USER_ID) VALUES (REPLACE(uuid(), '-', ''), '061200F010', '061200F010', '0040FBA053464647BD51141EECF4437F', '533f3a412a1f1fea24c8e164');