我有输入(例如,来自OpenBSD上的ifconfig run0 scan
),其中一些字段由空格分隔,但有些字段本身包含空格(幸运的是,这些包含空格的字段始终用引号括起来)。
我需要区分引号内的空格和分隔符空格。我们的想法是用引号替换引号中的空格。
示例数据:
%cat /tmp/ifconfig_scan | fgrep nwid | cut -f3
nwid Websense chan 6 bssid 00:22:7f:xx:xx:xx 59dB 54M short_preamble,short_slottime
nwid ZyXEL chan 8 bssid cc:5d:4e:xx:xx:xx 5dB 54M privacy,short_slottime
nwid "myTouch 4G Hotspot" chan 11 bssid d8:b3:77:xx:xx:xx 49dB 54M privacy,short_slottime
最终没有以我想要的方式处理,因为我还没有用下划线替换引号中的空格:
%cat /tmp/ifconfig_scan | fgrep nwid | cut -f3 |\
cut -s -d ' ' -f 2,4,6,7,8 | sort -n -k4
"myTouch Hotspot" 11 bssid d8:b3:77:xx:xx:xx
ZyXEL 8 cc:5d:4e:xx:xx:xx 5dB 54M
Websense 6 00:22:7f:xx:xx:xx 59dB 54M
答案 0 :(得分:4)
试试这个:
awk -F'"' '{for(i=2;i<=NF;i++)if(i%2==0)gsub(" ","_",$i);}1' OFS="\"" file
它适用于一行中的多引号部分:
echo '"first part" foo "2nd part" bar "the 3rd part comes" baz'| awk -F'"' '{for(i=2;i<=NF;i++)if(i%2==0)gsub(" ","_",$i);}1' OFS="\""
"first_part" foo "2nd_part" bar "the_3rd_part_comes" baz
EDIT替代形式:
awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' file
答案 1 :(得分:4)
对于仅sed
的解决方案(我不一定提倡),请尝试:
echo 'a b "c d e" f g "h i"' |\
sed ':a;s/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta'
a b "c_d_e" f g "h_i"
翻译:
junk"junk"
,重复零次或多次,junk
没有引号,然后是junk"junk space
。_
。答案 2 :(得分:3)
另一个尝试的方法:
awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\"
删除引号:
awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=
使用三倍大小的测试文件进行一些额外测试,进一步测试@steve的早期测试。我必须稍微转换sed
语句,以便非GNU sed
也可以处理它。我添加了awk
(bwk
)gawk3
,gawk4
和mawk
:
$ for i in {1..1500000}; do echo 'a b "c d e" f g "h i" j k l "m n o "p q r" s t" u v "w x" y z' ; done > test
$ time perl -pe 's:"[^"]*":($x=$&)=~s/ /_/g;$x:ge' test >/dev/null
real 0m27.802s
user 0m27.588s
sys 0m0.177s
$ time awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m6.565s
user 0m6.500s
sys 0m0.059s
$ time gawk3 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m21.486s
user 0m18.326s
sys 0m2.658s
$ time gawk4 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m14.270s
user 0m14.173s
sys 0m0.083s
$ time mawk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m4.251s
user 0m4.193s
sys 0m0.053s
$ time awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m13.229s
user 0m13.141s
sys 0m0.075s
$ time gawk3 '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m33.965s
user 0m26.822s
sys 0m7.108s
$ time gawk4 '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m15.437s
user 0m15.328s
sys 0m0.087s
$ time mawk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m4.002s
user 0m3.948s
sys 0m0.051s
$ time sed -e :a -e 's/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta' test > /dev/null
real 5m14.008s
user 5m13.082s
sys 0m0.580s
$ time gsed -e :a -e 's/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta' test > /dev/null
real 4m11.026s
user 4m10.318s
sys 0m0.463s
mawk
提供了最快的结果......
答案 3 :(得分:2)
perl
会让你感觉更好。代码更具可读性和可维护性:
perl -pe 's:"[^"]*":($x=$&)=~s/ /_/g;$x:ge'
根据您的输入,结果为:
a b "c_d_e" f g "h_i"
说明:
-p # enable printing
-e # the following expression...
s # begin a substitution
: # the first substitution delimiter
"[^"]*" # match a double quote followed by anything not a double quote any
# number of times followed by a double quote
: # the second substitution delimiter
($x=$&)=~s/ /_/g; # copy the pattern match ($&) into a variable ($x), then
# substitute a space for an underscore globally on $x. The
# variable $x is needed because capture groups and
# patterns are read only variables.
$x # return $x as the replacement.
: # the last delimiter
g # perform the nested substitution globally
e # make sure that the replacement is handled as an expression
一些测试:
for i in {1..500000}; do echo 'a b "c d e" f g "h i" j k l "m n o "p q r" s t" u v "w x" y z' >> test; done
time perl -pe 's:"[^"]*":($x=$&)=~s/ /_/g;$x:ge' test >/dev/null
real 0m8.301s
user 0m8.273s
sys 0m0.020s
time awk 'BEGIN{FS=OFS="\""} {for(i=2;i<NF;i+=2)gsub(" ","_",$i)} 1' test >/dev/null
real 0m4.967s
user 0m4.924s
sys 0m0.036s
time awk '!(NR%2){gsub(FS,"_")}1' RS=\" ORS=\" test >/dev/null
real 0m4.336s
user 0m4.244s
sys 0m0.056s
time sed ':a;s/^\(\([^"]*"[^"]*"[^"]*\)*[^"]*"[^"]*\) /\1_/;ta' test >/dev/null
real 2m26.101s
user 2m25.925s
sys 0m0.100s
答案 4 :(得分:1)
不是回答,只是为@ steve的perl代码发布awk等效代码,以防任何人感兴趣(并帮助我记住这一点):
@steve发布:
perl -pe 's:"[^\"]*":($x=$&)=~s/ /_/g;$x:ge'
并且从阅读@ steve的解释来看,等同于perl代码的最简单的awk(不是首选的awk解决方案 - 请参阅@ Kent的答案)将是GNU awk:
gawk '{
head = ""
while ( match($0,"\"[^\"]*\"") ) {
head = head substr($0,1,RSTART-1) gensub(/ /,"_","g",substr($0,RSTART,RLENGTH))
$0 = substr($0,RSTART+RLENGTH)
}
print head $0
}'
我们从具有更多变量的POSIX awk解决方案开始:
awk '{
head = ""
tail = $0
while ( match(tail,"\"[^\"]*\"") ) {
x = substr(tail,RSTART,RLENGTH)
gsub(/ /,"_",x)
head = head substr(tail,1,RSTART-1) x
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
并使用GNU awk&#39; gensub()保存一行:
gawk '{
head = ""
tail = $0
while ( match(tail,"\"[^\"]*\"") ) {
x = gensub(/ /,"_","g",substr(tail,RSTART,RLENGTH))
head = head substr(tail,1,RSTART-1) x
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
然后摆脱变量x:
gawk '{
head = ""
tail = $0
while ( match(tail,"\"[^\"]*\"") ) {
head = head substr(tail,1,RSTART-1) gensub(/ /,"_","g",substr(tail,RSTART,RLENGTH))
tail = substr(tail,RSTART+RLENGTH)
}
print head tail
}'
然后摆脱变量&#34; tail&#34;如果你不需要$ 0,NF等,在循环之后就会徘徊:
gawk '{
head = ""
while ( match($0,"\"[^\"]*\"") ) {
head = head substr($0,1,RSTART-1) gensub(/ /,"_","g",substr($0,RSTART,RLENGTH))
$0 = substr($0,RSTART+RLENGTH)
}
print head $0
}'