如何从CSV中删除所有列中的前导和尾随空格?

时间:2014-07-24 16:36:46

标签: regex perl

我的CSV看起来像这样:

things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff

如何从除第二列(ID)之外的所有列中删除前导和尾随空格?最终输出如下:

things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff

我尝试使用以下正则表达式,但它会删除所有字段中的空格,包括ID列中的字段。

s/( +,|, +)/,/gi;

5 个答案:

答案 0 :(得分:3)

分裂,选择性修剪,重新加入

perl -F, -lane 's/^\s+|\s+$//g for @F[0,2..$#F]; print join ",", @F' file.csv

说明:

切换

  • -F/pattern/split()切换-a模式//是可选的)
  • -l:启用行结束处理
  • -a:拆分空间线并将其加载到数组@F
  • -n:为输入文件中的每一行创建一个while(<>){...}循环。
  • -e:告诉perl在命令行上执行代码。

<强>代码

  • EXPR for @F[0,2..$#F]:迭代数组切片(跳过第二个字段)
  • s/^\s+|\s+$//g:从字段中删除前导和尾随空格
  • print join ",", @F:打印结果

答案 1 :(得分:0)

使用awk

awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i) if (i != 2) { sub(/^[ \t]+/, "", $i); sub(/[ \t]+$/, "", $i) } } 1' file

输出:

things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff

它的作用:

  • 将字段分隔符和输出字段分隔符设置为,
  • 遍历字段值。如果字段编号不是2,则修剪前导和尾随空格。
  • 打印。

答案 2 :(得分:0)

您可以在替换中指定每个字段:

#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

for my $line ( <DATA> ) {
    chomp $line;
    $line =~ s/^\s*(\S+)\s*,   # Things: trim off the spaces
        (.+?),                # ID: Leave alone
        \s*(\S+)\s*,          # Hello Field: trim off spaces
        \s*(\S+)\s*           # More things: trim off spaces
        /$1,$2,$3,$4/x;
    say $line;
}

__DATA__
things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff   
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff

在这里,我在正则表达式的末尾使用x,这允许我将表达式分解为多行。

这会产生:

things,ID,hello_field,morethings
stuff,123  ,hello,morestuff
stuff,123 ,hello,morestuff
stuff,123  ,hello,morestuff
stuff,123  ,hello,morestuff
stuff,123,hello,morestuff   
stuff,123,hello,morestuff
stuff,123,hello,morestuff

我在考虑使用命名捕获组。如果你移动东西并且你有很多捕获组,它们就很好。但是,在这种情况下,我认为它不会让事情变得更容易阅读:

#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);

for my $line ( <DATA> ) {
    chomp $line;
    $line =~ s/^\s*(?<things>\S+)\s*,       # Things: trim off the spaces
        (?<id>.+?),                         # ID: Leave alone
        \s*(?<hello_field>\S+)\s*,          # Hello Field: trim off spaces
        \s*(?<more_things>\S+)\s*           # More things: trim off spaces
        /$+{things},$+{id},$+{hello_field},$+{more_things}/x;
    say $line;
}

__DATA__
things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff   
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff

答案 3 :(得分:0)

我更喜欢@Miller的答案,它使用正则表达式作为OP请求,但在需要时还有Text::Trim

perl -MText::Trim -F, -anE 'trim for @F[0,2..$#F]; say join ",", @F' test.csv

或:

use Text::Trim;
for (<>){
  my @line = split(/,/);
  trim for @line[0,2..$#line];
  print join",", @line, "\n";
}

我希望我没有劫持这个帖子,但是我想向自己解释为什么Text::Trim在这里工作但String::Util qw/trim/没有。而且,更多的是OP的问题,为什么一个工作就像将s//(即表达式)应用于迭代值而另一个不应用。我认为它与修改字符串的原始值有关。 ie String::Utiltrim更类似于使用帖子 5.14 “非破坏性替换标志”aka "/r" s/^\s+|\s+$//rg Text::Trim更直接修剪...

在任何情况下Text::Trim都使用此正则表达式:

s/\A\s+//; s/\s+\z// ;    

(以及wantarray等)其中String::Util的{​​{1}} sub与errm不同......也许这在这里很有用; - )

答案 4 :(得分:-1)

虽然我已将内容存储在变量中,但您可以根据需要使用它。所以,试试这个:

#!/usr/bin/perl
use strict;
use Data::Dumper;

my $str="things,ID,hello_field,more things
stuff,123  ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123  ,hello ,more stuff
stuff,123  ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff";

$str=join("\n",map{my ($a,$b,$c)=($1,$2,$3) if($_=~/(.*?),(.*?),(.*)/is);$a=~s/^\s*|\s$//sg;$c=~s/\s*,\s*/,/sg;$_=join(",",$a,$b,$c);$_} split /\n/i,$str);

print $str;

输出:

things,ID,hello_field,more things
stuff,123  ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123  ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff