Question

我是正则表达式范例的新手，我遇到了一个我试图解决的问题，但没有成功。

想象一下文件test.txt：

hello everyone, whatsi up
i hope my program worksa
if it doesnt... ho well!

我想输出到另一个文件output.txt，只输出以辅音开头并以元音结尾的单词，这样会导致：

hello whatsi
hope worksa
ho

我正在使用grep -o '\b[^ aeiouAEIOU]\w*[aeiouAEIOU]\b' test.txt > output.txt但是，-o标志将每个匹配的字符串输出到新行。我该怎么做才能获得我想要的格式？另一个有效的选择是使用sed替换那些与该模式不匹配的空格，但我也没能这样做。我应该使用sed还是awk？

由于

Answer 1

使用GNU awk实现多字符RS，RT和字边界：

$ gawk -v RS='\\<[^aeiou][[:alpha:]]*[aeiou]\\>' 'RT{print RT}' file
hello
whatsi
hope
worksa
ho

如果你需要保留原来的换行符，那么使用GNU awk for FPAT而不是RS：

$ gawk -v FPAT='\\<[^aeiou][[:alpha:]]*[aeiou]\\>' '{for (i=1; i<=NF; i++) printf "%s%s", $i, (i<NF?OFS:ORS)}' file
hello whatsi
hope worksa
ho

Answer 2

您可以指示grep将输入视为一组空字节终止的行，即如果您的输入不包含空字节，则将其作为一个长行，并使用-z / {{ 1}}标志。

这样你可以保留换行符（最后注意--null-data）：

...| |\n

但是以空字节（$ grep -Pozi '\b[bcdfghjklmnpqrstvwxyz]\w*[aeiou]\b| |\n' file hello whatsi hope worksa ho）字符为代价（和多个空格，由于我们的正则表达式）。这些可以通过几个\x0表达式来修复：

sed

（一个用于去除空字节，一个用单个空格替换多个空格，两个用于去除前导和尾随空格）。

Answer 3

使用-n选项输出行号，然后您可以重新组合匹配。

例如，在Perl中：

grep -no '\b[^ aeiouAEIOU]\w*[aeiouAEIOU]\b' test.txt \
| perl -aF: -nwE 'chomp $F[1];
                push @{ $b[ $F[0] ] }, $F[1]
                }{ say "@$_" for grep defined, @b'

Answer 4

Perl本身在这里运行良好：对于每一行，找到符合条件的每个单词

-a

使用@F将该行拆分为单词，存储在数组grep
perl -lape'$_="@{[grep{/\b(?=[a-z])[^aeiou][a-z]*[aeiou]\b/i}@F]}"' file将仅过滤与正则表达式匹配的单词
然后将结果列表与空格连接并打印出来。
- 如果一行上没有字匹配，则会打印一个空行。

打高尔夫球

[^aeiou]

请注意，数字与(?=[a-z])匹配，这就是为什么我添加前瞻DialogPane来限制单词的第一个字符是字母而不是元音。

Answer 5

在-P

的grep的帮助下

~ ❱ grep -Po '\w+' file
hello
everyone
whatsi
up
i
hope
my
program
worksa
if
it
doesnt
ho
well
~ ❱ grep -Po '\b(?![oauie])[a-z]+((?=[oauie]).)\b' file
hello
whatsi
hope
worksa
ho
~ ❱ 
~ ❱ # return in a single line:
s~ ❱ grep -zPo '\b(?![oauie])[a-z]+((?=[aeiou]).) \b' file
hello whatsi hope ho ~❱
~ ❱ 
~ ❱

如何运作

-P用于PCRE

，模式遵循以下步骤：

它与单词开头的任何[aieuo]都不匹配
然后匹配一些字符[a-z]+ if：
该词的结尾有[aieuo]

注意

我的回答并没有保留其中的字样。我想写一个 Perl one-liner ，然后注意到@glenn jackman已经这样做了。因此，您可以使用该答案或：

~ ❱ perl -lae' print for "@{[ grep{/\b(?![oauie])[a-z]+((?=[oauie]).)\b/} @F ]}" ' file                                                                                                       
hello whatsi                                                                                                                                                                                           
hope worksa
ho
~ ❱

或没有"@{[ ... ]}运营商：

~ ❱ perl -lae '@arr = grep /\b(?![oauie])[a-z]+((?=[oauie]).)\b/, @F;print "@arr"' file
hello whatsi
hope worksa
ho
~ ❱

Answer 6

遵循awk解决方案也可以帮助你。

awk '{for(i=1;i<=NF;i++){if(tolower($i) ~ /^[^aeiou].*[aeiou]$/){val=val?val OFS $i:$i}};print val;val=""}'  Input_file

输出如下。

hello whatsi
hope worksa
ho

此处还添加了非单一的衬垫形式，并附有说明。

awk '{
for(i=1;i<=NF;i++){       ##Starting a for loop here which starts from variable i value from 1 to till the value of NF(number of fields) value.
  if(tolower($i) ~ /^[^aeiou].*[aeiou]$/){ ##checking here condition if a field value in lower is satisfying the regex where I am checking if a value NOT starts from vowel and it is ending with vowels.
    val=val?val OFS $i:$i ##Creating a variable named val which will have value of current field value and it will concatenate its own value.
}
};
  print val;              ##Outside of loop, I am printing the value of variable val here, which will have all those words which are satisfying your conditions.
  val=""                  ##Nullifying the value of variable val here.
}
' Input_file              ##Mentioning the Input_file name here.

Answer 7

这是匹配以辅音开头并以元音结尾的单词的正则表达式

/\<[^ aeiouAEIOU]\w*[aeiouAEIOU]\>/

我们可以使用它来选择我们的单词并使用Ex / Vim编辑器删除其他所有内容。

所以给定由以下命令创建的test.txt文件：

$ printf "hello everyone, whatsi up\ni hope my program worksa\nif it doesnt... ho well!" > test.txt

此shell命令将读取文件并将解析后的输出保存到out.txt文件中：

$ ex -s +'%s/\<\w\+\>\(\<[^ aeiouAEIOU]\w*[aeiouAEIOU]\>\)\@<!\s\?//g' +"%s/\([[:punct:]]\+\)//g" +%p +'wq! out.txt' test.txt 
hello  whatsi 
hope worksa
 ho

说明：

\<\w\+\> - 选择所有字词;
$\<[^ aeiouAEIOU]\w*[aeiouAEIOU]\>$ - 选择要保留的字词;
\@<! - 如果前面的原子在后面的内容之前不匹配，则它与零宽度匹配（参见：:help \@<!）;
%s/pattern/replace/g - 用替换文字替换模式;
%s/$[[:punct:]]\+$//g - 删除所有标点字符;
+%p - 将文件缓冲区打印到标准输出;
wq! file.txt - 将当前缓冲区写入文件;

以上解决方案基于以下答案：How to remove all words which doesn't match the pattern?

使用grep或sed替换不匹配模式的单词

7 个答案: