我如何搜索&用sed替换而不包括一组字符?

时间:2014-05-03 00:23:29

标签: regex linux bash sed

你好,在下面的sed命令中,我需要在第二组括号中不要接受以下一组词:Inc The Ltd LLC

它会破坏list.txt中的以下数据,使每个公司名称都在一行上,公司名称在逗号之后但有时候" Inc"," Ltd",& #34; LLC"和""关注公司。

这是一个非常先进的正则表达式,我似乎无法得到。

sed -re 's/([a-zA-Z.]), (Need code here)/\1\n\2/g' list.txt

list.txt包含以下数据:

Electronic Arts, Inc., Electronic Arts Ltd.
Activision Publishing, Inc., ak tronic Software & Services GmbH
Coplin Software
Electronic Arts, Inc.
Electronic Arts, Inc.
In-Fusio
Activision Publishing, Inc.
Domark Ltd.
Electronic Arts, Inc.
Electronic Arts, Inc.
Aspyr Media, Inc., Electronic Arts, Inc.
Activision Deutschland GmbH, Activision Publishing, Inc., ak tronic Software & Services GmbH, Noviy Disk, Square Enix Co., Ltd.
Electronic Arts, Inc.
Electronic Arts, Inc., Electronic Arts Ltd.
Electronic Arts, Inc.
Electronic Arts, Inc.
Electronic Arts, Inc., Electronic Arts Square, K.K., MGM Interactive
Electronic Arts Ltd.

预期输出(注意逗号):

GarageGames, Inc.
The Avalon Hill Game Company
Microforum International, The
Telenet Japan Co., Ltd.
Glu Mobile, Inc.
Warner Bros. Digital Distribution
Atari, Inc.

4 个答案:

答案 0 :(得分:3)

根据您的示例list.txt,您可以尝试:

  sed -re 's/(, )?(Inc.|The|Ltd.?|LLC)//g' list.txt| tr ',' '\n' | sed -re 's/(.*)/\1/g' | sed -re '/^\s*$/d' | sed -re 's/(^ | $)//g'

<强> 输出:

Electronic Arts
Electronic Arts
Activision Publishing
ak tronic Software & Services GmbH
Coplin Software
Electronic Arts
Electronic Arts
In-Fusio
Activision Publishing
Domark
Electronic Arts
Electronic Arts
Aspyr Media
Electronic Arts
Activision Deutschland GmbH
Activision Publishing
ak tronic Software & Services GmbH
Noviy Disk
Square Enix Co.
Electronic Arts
Electronic Arts
Electronic Arts
Electronic Arts
Electronic Arts
Electronic Arts
Electronic Arts Square
K.K.
MGM Interactive

<强> 注:

您可以将上面的列表发送到awk并仅显示唯一结果,例如:

sed -re 's/(, )?(Inc.|The|Ltd.?|LLC)//g' list.txt| tr ',' '\n' | sed -re 's/(.*)/\1/g' | sed -re '/^\s*$/d' | sed -re 's/(^ | $)//g'| awk '!seen[$0]++'

输出:

Electronic Arts
Activision Publishing
ak tronic Software & Services GmbH
Coplin Software
In-Fusio
Domark
Aspyr Media
Activision Deutschland GmbH
Noviy Disk
Square Enix Co.
Electronic Arts Square
K.K.
MGM Interactive

答案 1 :(得分:3)

perl -pe 's/([^,]), (?!Inc|LLC|The|Ltd)/\1\n/g' list.txt

答案 2 :(得分:1)

sed -nr '/^ *([^,]+(, *(Inc\.?|The|Ltd\.?|LLC))?)(,(.*))?/ {
                   s//\1\n\5/
                   P
                   D
}'             

答案 3 :(得分:0)

perl版本:

$ perl -anlF'(?!,[\x20](?:Inc|Ltd|LLC|The).?),[\x20]' -e '$n{$_}++ for @F; END { print join "\n", sort keys %n; }' test.txt
Activision Deutschland GmbH
Activision Publishing, Inc.
Aspyr Media, Inc.
Coplin Software
Domark Ltd.
Electronic Arts Ltd.
Electronic Arts Square
Electronic Arts, Inc.
In-Fusio
K.K.
MGM Interactive
Noviy Disk
Square Enix Co., Ltd.
ak tronic Software & Services GmbH