Question

我在平台文件中有一些行。以2行为例：

1 aa bb 05 may 2014 cc G 14-MAY-2014 hello world
j  sd  az 20140505    sd  G 14-MAY-2014 hello world haha

所以也许你已经注意到了，我既不能计算char的数量，也不能计算空间的数量，因为线条没有很好地对齐，而第四个字段，有时就像20140505，有时候它是比如05 may 2014。所以我想要的是尝试匹配G，或匹配14-MAY-2014。然后，我可以轻松获得以下字段：hello world或hello world haha。所以有人可以帮助我吗？谢谢！

Answer 1

假设您的行位于名为test.txt的文件中：

 cat test.txt | sed -r 's/^.*-[0-9]{4}\s//'

这是在Linux系统上使用GNU sed。还有很多其他方法。在这里，我只是删除任何内容，包括从开始行开始的日期。

sed -r 's/^.*-[0-9]{4}\s//'

-r = extendes reg ex, makes things like the quantor {4} possible
's/ ... //' = s is for substitute, 
              it matches the first part and replaces it with the second.
              since the resocond part is empty, it's a remove/delete
^  = start of line
.* = any character, any number of times
-[0-9]{4} = a dash, followed by four digits ([0-9]), the year part of the date
\s = any white space

Answer 2

你可以使用perl的lookbehind正则表达式：

perl -lne '/(?<=14-MAY-2014)(.*)/ && print $1' file

它将在2014年5月14日之后打印任何内容。

如果支持-P：

，也可以使用grep

grep -Po '(?<=14-MAY-2014)(.*)'  file

在shell中，如何处理这一行，以便提取我想要的字段

2 个答案: