使用sed与CSV文件的负匹配

时间:2017-05-04 16:47:09

标签: sed

我有以下格式的CSV文件:

$ tail X.csv | sed 's/[a-zA-Z0-9]/X/g'
XXXXXXX/XXXXXXXX XXXXXXXXXXXX), XXXXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXX (X),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXX XXXXXXXXX XXXXXX XXXX XXX XXXXXXXX XX XXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX): XXXXXXXX X XXXXXXXXXX XXXX X XXXXXXXXXX.,XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXXXXXXXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXX,XXXXXXXXX (XXXXXX XXXXXXX XXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXXX XXXXXXXXX XXXXXXXX XXX XXXXXX XXXXXXX XXXXXXX (XXXXXXXXX).,XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXX XXXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXXX XXXX XXXXXXX(X) XX XX/XX/XXXX XXX XXXXXXX XXXXXXXX (XXXXXXXXX).,XXXXX,,X,X,X,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXX,XXXXXXXXXXX (XXXXXXX XXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXX (XXXXXXXXX) (XXXXXXXXXXX XX XXXXX XXX XXXXXXXX-XXXX XXXXXXXXXXX): XXXXXXXXXXXXXXXXXXX (XXXXX), XXXXXXXXXXXXXXXXXX (XXXXX), XXXXXXXXXXXXXX (XXXX), XXXXXXXXXXXXXXXX (XXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXX (XXXXXXX XXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXXX XXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXXXXXXXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXX XXXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXXX XXXX XXXXXXX(X) XX XX/XX/XXXX XXX XXXXX XXXXXXXX (XXXXXXXXX).,XXXXX,,X,X,X,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXXXXXXX (XXXXXXXX XXXXXXXXX),XXXXX,XX.XXX.XXX.XX,XXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXXX XXXXX (XXXXXXXXX) (XXXXXXX XXXX): XXXXXXXXXXXXXX (XXXXX),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXX (XXXXXXXX XXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
$ 

除了分隔符逗号外,生成的CSV文件还包含逗号作为值的一部分,因此我需要sed(1)将分隔符替换为另一个分隔符,例如|

不幸的是,无法重新生成文件(用其他东西替换分隔符)。

我未成功的尝试:

$ tail X.csv | sed 's/[a-zA-Z0-9]/X/g' | sed --regexp-extended '/,/!s/,%s/|/g' | tail -1 
XXXXXXXXX,XXXX-XX-XX XX:XX:XX.XXXXXXXXX,XX,XXXXX,X,XXXXXX,X,XXXXXX,XXXXXXX (XXXXXXXX XXXXX),XXXXX,XX.XXX.XXX.XX,XXXXX,XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.),XXXXX,,X,XXX,XXXXXXX,,,{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
$ 

我该如何解决这个问题?

3 个答案:

答案 0 :(得分:1)

我不是sed的粉丝,所以这是使用perl的版本:

cat X.csv | perl -p -e "s/,(\S)/|\$1/g"

这基本上意味着“用'|'替换非空格的''序列'跟随那个非空格字符“

或者这是使用sed的版本(应该与POSIX兼容):

cat X.csv | sed -E 's/,([^[:space:]])/|\1/g'

答案 1 :(得分:0)

使用:

sed -re 's/([^ ]),([^ ])/\1|\2/g'

答案 2 :(得分:0)

...在@nochkin的帮助下,我提出了sed解决方案:

$ tail -1 X.csv | sed 's/[a-zA-Z0-9]/X/g' | sed --regexp-extended 's/,(\S)/|\1/g' 
XXXXXXXXX|XXXX-XX-XX XX:XX:XX.XXXXXXXXX|XX|XXXXX|X|XXXXXX|X|XXXXXX|XXXXXXX (XXXXXXXX XXXXX)|XXXXX|XX.XXX.XXX.XX|XXXXX|XXXXXXX XXX XXXXXXXX XXX XXXXXX XXXXXXXX (XXXXXXXXX) (XXXXXXX XXXXXX XX XXXXXXXXXXX XXX XXXXXXXXXX XXXXX): XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXX, XXXXXXXXXXXXXXXXXXXXXXXX (XXXXXX XXXXX XXXXXX XXXXXXXXXXXXX XXXX XXX XXXXX. XXX XX XXXX XXXXXX.), XXXXXXXXXXXXXXXXXXXXXXXXXXXX (XXX XX XXXX XXXXX XXX XXX XXXX XXXXXXX.)|XXXXX|,X|XXX|XXXXXXX|,|{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
$ sed --version
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
and Paolo Bonzini.
GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed@gnu.org>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
$