Question

我正在努力获得正确的REGEX来执行此任务：

匹配包含特定字词的行的第N个字

例如：

输入：

this is the first line - blue
this is the second line - green
this is the third line - red

我想匹配包含单词« second »

的行的第7个字词

期望的输出：

green

有谁知道怎么做？

我正在使用http://rubular.com/来测试REGEX。

我已经尝试过这个REGEX 没有成功 - 它匹配下一行

(.*second.*)(?<data>.*?\s){7}(.*)

---更新---

示例2

输入：

this is the Foo line - blue
this is the Bar line - green
this is the Test line - red

我希望匹配包含“红色”字样的第4个字词

期望的输出：

Test

换句话说 - 我要匹配的单词可以是之前的或之后我用来选择行的单词

Answer 1

您可以使用它来匹配包含second的行并获取第7个字：

^(?=.*\bsecond\b)(?:\S+ ){6}(\S+)

确保全局和多行标志处于活动状态。

^匹配一行的开头。

(?=.*\bsecond\b)是一个积极的先行，以确保该特定行中有second这个词。

(?:\S+ ){6}匹配6个字。

(\S+)将获得第7名。

regex101 demo

您可以将相同的原则应用于其他要求。

一行包含red并获得第四个字......

^(?=.*\bred\b)(?:\S+ ){3}(\S+)

Answer 2

你问了正则表达式，你得到了一个很好的答案。

有时您需要提出解决方案，而不是指定工具。

以下是我认为最符合您需求的单线：

awk '/second/ {print $7}' < inputFile.txt

说明：

/second/     - for any line that matches this regex (in this case, literal 'second')
print $7     - print the 7th field (by default, fields are separated by space)

我认为它比正则表达式更容易理解 - 而且它对于这种处理更灵活。

REGEX - 匹配包含特定单词的行的第N个单词

2 个答案: