我的CSV看起来像这样:
things,ID,hello_field,more things
stuff,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
如何从除第二列(ID
)之外的所有列中删除前导和尾随空格?最终输出如下:
things,ID,hello_field,more things
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
我尝试使用以下正则表达式,但它会删除所有字段中的空格,包括ID
列中的字段。
s/( +,|, +)/,/gi;
答案 0 :(得分:3)
分裂,选择性修剪,重新加入
perl -F, -lane 's/^\s+|\s+$//g for @F[0,2..$#F]; print join ",", @F' file.csv
切换:
-F/pattern/
:split()
切换-a
模式//
是可选的)-l
:启用行结束处理-a
:拆分空间线并将其加载到数组@F
-n
:为输入文件中的每一行创建一个while(<>){...}
循环。 -e
:告诉perl
在命令行上执行代码。 <强>代码强>:
EXPR for @F[0,2..$#F]
:迭代数组切片(跳过第二个字段)s/^\s+|\s+$//g
:从字段中删除前导和尾随空格print join ",", @F
:打印结果答案 1 :(得分:0)
awk
:awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i) if (i != 2) { sub(/^[ \t]+/, "", $i); sub(/[ \t]+$/, "", $i) } } 1' file
things,ID,hello_field,more things
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
,
。答案 2 :(得分:0)
您可以在替换中指定每个字段:
#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);
for my $line ( <DATA> ) {
chomp $line;
$line =~ s/^\s*(\S+)\s*, # Things: trim off the spaces
(.+?), # ID: Leave alone
\s*(\S+)\s*, # Hello Field: trim off spaces
\s*(\S+)\s* # More things: trim off spaces
/$1,$2,$3,$4/x;
say $line;
}
__DATA__
things,ID,hello_field,more things
stuff,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
在这里,我在正则表达式的末尾使用x
,这允许我将表达式分解为多行。
这会产生:
things,ID,hello_field,morethings
stuff,123 ,hello,morestuff
stuff,123 ,hello,morestuff
stuff,123 ,hello,morestuff
stuff,123 ,hello,morestuff
stuff,123,hello,morestuff
stuff,123,hello,morestuff
stuff,123,hello,morestuff
我在考虑使用命名捕获组。如果你移动东西并且你有很多捕获组,它们就很好。但是,在这种情况下,我认为它不会让事情变得更容易阅读:
#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);
for my $line ( <DATA> ) {
chomp $line;
$line =~ s/^\s*(?<things>\S+)\s*, # Things: trim off the spaces
(?<id>.+?), # ID: Leave alone
\s*(?<hello_field>\S+)\s*, # Hello Field: trim off spaces
\s*(?<more_things>\S+)\s* # More things: trim off spaces
/$+{things},$+{id},$+{hello_field},$+{more_things}/x;
say $line;
}
__DATA__
things,ID,hello_field,more things
stuff,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff
答案 3 :(得分:0)
我更喜欢@Miller的答案,它使用正则表达式作为OP请求,但在需要时还有Text::Trim
:
perl -MText::Trim -F, -anE 'trim for @F[0,2..$#F]; say join ",", @F' test.csv
或:
use Text::Trim;
for (<>){
my @line = split(/,/);
trim for @line[0,2..$#line];
print join",", @line, "\n";
}
我希望我没有劫持这个帖子,但是我想向自己解释为什么Text::Trim
在这里工作但String::Util qw/trim/
没有。而且,更多的是OP的问题,为什么一个工作就像将s//
(即表达式)应用于迭代值而另一个不应用。我认为它与修改字符串的原始值有关。 ie String::Util
版trim
更类似于使用帖子 5.14 “非破坏性替换标志”aka "/r"
: s/^\s+|\s+$//rg
Text::Trim
更直接修剪...
在任何情况下Text::Trim
都使用此正则表达式:
s/\A\s+//; s/\s+\z// ;
(以及wantarray
等)其中String::Util
的{{1}} sub与errm不同......也许这在这里很有用; - )
答案 4 :(得分:-1)
虽然我已将内容存储在变量中,但您可以根据需要使用它。所以,试试这个:
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $str="things,ID,hello_field,more things
stuff,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123 ,hello ,more stuff
stuff,123 ,hello ,more stuff
stuff ,123,hello ,more stuff
stuff,123,hello ,more stuff
stuff ,123,hello ,more stuff";
$str=join("\n",map{my ($a,$b,$c)=($1,$2,$3) if($_=~/(.*?),(.*?),(.*)/is);$a=~s/^\s*|\s$//sg;$c=~s/\s*,\s*/,/sg;$_=join(",",$a,$b,$c);$_} split /\n/i,$str);
print $str;
输出:
things,ID,hello_field,more things
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123 ,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff
stuff,123,hello,more stuff