使用sed删除行的开头直到大写单词

时间:2013-10-08 08:57:49

标签: regex sed regex-greedy

我正在尝试使用sed删除多行的开头。目标是在每一行中删除所有字符,直到一个带有两个连续大写字母的单词。

输入将始终类似于:

1 where did you get ACQUIRE, obtain, come by, receive, gain, earn, win, come into, take 
2 I got your letter: RECEIVE, be sent, be in receipt of, be given.
3 your tea is getting cold: BECOME, grow, turn, go.
4 get the children from school: FETCH, collect, go for, call for, pick up, bring, deliver, convey, ferry, transport.
5 the chairman gets £650,000 a year: EARN, be paid, take home, bring in, make, receive, collect, gross; informal pocket, bank, rake in, net, bag.
6 have the police got their man?: APPREHEND, catch.

我希望输出为:

ACQUIRE, obtain, come by, receive, gain, earn, win, come into, take 
RECEIVE, be sent, be in receipt of, be given.
BECOME, grow, turn, go.
FETCH, collect, go for, call for, pick up, bring, deliver, convey, ferry, transport.
EARN, be paid, take home, bring in, make, receive, collect, gross; informal pocket, bank, rake in, net, bag.
APPREHEND, catch.

我要建立这个:

sed -n 's/^.*[A-Z]\{2\}//p'

但是这个表达式也删除了大写单词。有关如何做到这一点的任何线索?

2 个答案:

答案 0 :(得分:1)

的问题在于缺乏前瞻和非贪婪的选项。解决此问题的一种方法是进行两次替换。第一个获取您想要的文本,将其保存为组1并将其附加到换行符之后,然后删除所有数据,直到该换行符,如下所示:

sed 's/\([A-Z]\{2,\}.*\)/\n\1/; s/[^\n]*\n//' infile

它产生:

ACQUIRE, obtain, come by, receive, gain, earn, win, come into, take 
RECEIVE, be sent, be in receipt of, be given.
BECOME, grow, turn, go.
FETCH, collect, go for, call for, pick up, bring, deliver, convey, ferry, transport.
EARN, be paid, take home, bring in, make, receive, collect, gross; informal pocket, bank, rake in, net, bag.
APPREHEND, catch.

答案 1 :(得分:1)

这应该适用于awk,但它会在行5上提供错误的输出

awk '{print substr($0,match($0,/[[:upper:]][[:upper:]]/))}' file
ACQUIRE, obtain, come by, receive, gain, earn, win, come into, take
RECEIVE, be sent, be in receipt of, be given.
BECOME, grow, turn, go.
FETCH, collect, go for, call for, pick up, bring, deliver, convey, ferry, transport.
5 the chairman gets
APPREHEND, catch.

match找到两个第一个大写字母,然后substr使用它来打印该行的最后一部分。