Question

有没有办法让grep从匹配搜索表达式的文件中输出“words”？

如果我想找到许多文件中的所有实例，比如“th”，我可以这样做：

grep "th" *

但是输出会像（粗体是我）;

some-text-file : the cat sat on the mat  
some-other-text-file : the quick brown fox  
yet-another-text-file : i hope this explains it thoroughly

我希望它使用相同的搜索输出：

the
the
the
this
thoroughly

使用grep可以吗？或者使用其他工具组合？

Answer 1

尝试grep -o

grep -oh "\w*th\w*" *

编辑：匹配Phil的评论

来自the docs：

-h, --no-filename
    Suppress the prefixing of file names on output. This is the default
    when there is only  one  file  (or only standard input) to search.
-o, --only-matching
    Print  only  the matched (non-empty) parts of a matching line,
    with each such part on a separate output line.

Answer 2

交叉分发安全答案（包括Windows minGW？）

grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"

如果您使用的旧版本的grep（如2.4.2）不包含-o选项。使用上面的。否则使用更简单的维护版本。

Linux交叉发布安全回答

grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'

总结-oh输出正则表达式匹配文件内容（而不是文件名），就像你期望正则表达式在vim / etc中工作一样...你会用什么单词或正则表达式正在寻找，取决于你！只要您保留POSIX而不是perl语法（参见下文）

More from the manual for grep

-o      Print each match, but only the match, not the entire line.
-h      Never print filename headers (i.e. filenames) with output lines.
-w      The expression is searched for as a word (as if surrounded by
         `[[:<:]]' and `[[:>:]]';

原始答案不适合所有人的原因

\w的使用因平台而异，因为它是一种扩展的“perl”语法。因此，那些仅限于使用POSIX字符类的grep安装使用[[:alpha:]]而不是其perl等效于\w。 See the Wikipedia page on regular expression for more

最终，无论grep的平台（原始版本）如何，上面的POSIX答案都会更加可靠

对于没有-o选项的grep的支持，第一个grep输出相关的行，tr将空格分割为新的行，最后的grep仅针对相应的行进行过滤。

（PS：我现在知道大多数平台都会被修补为\ w ......但是总有那些落后了）

来自@AdamRosenfield的“-o”解决方法的答案

Answer 3

您可以将空格转换为换行符，然后转换为grep，例如：

cat * | tr ' ' '\n' | grep th

Answer 4

只需awk，无需组合工具。

# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly

Answer 5

这比你想象的要简单。试试这个：

egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)

egrep -iwo 'th.[a-z]*' filename.txt  ### (Case Insensitive)

<强>其中，

 egrep: Grep will work with extended regular expression.
 w    : Matches only word/words instead of substring.
 o    : Display only matched pattern instead of whole line.
 i    : If u want to ignore case sensitivity.

Answer 6

grep命令仅用于匹配和perl

grep -o -P 'th.*? ' filename

Answer 7

cat *-text-file | grep -Eio "th[a-z]+"

Answer 8

我对awk难以记住的语法感到不满意，但我喜欢使用一个实用程序来实现这一点。

看起来像ack（如果使用Ubuntu，则为ack-grep）可以轻松完成此任务：

# ack-grep -ho "\bth.*?\b" *

the
the
the
this
thoroughly

如果省略-h标志，则会得到：

# ack-grep -o "\bth.*?\b" *

some-other-text-file
1:the

some-text-file
1:the
the

yet-another-text-file
1:this
thoroughly

作为奖励，你可以使用--output标志为更复杂的搜索执行此操作，并使用我发现的最简单的语法：

# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file

1, 5, 12/27/2010

Answer 9

要搜索所有以“icon-”开头的单词，以下命令将完美无缺。我在这里使用的Ack类似于grep，但有更好的选项和更好的格式。

ack -oh --type=html "\w*icon-\w*" | sort | uniq

Answer 10

您也可以尝试 pcregrep 。 grep 中还有一个-w选项，但在某些情况下，它无法按预期工作。

来自Wikipedia：

cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple

grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple

Answer 11

我有一个类似的问题，寻找grep / pattern正则表达式和“匹配模式找到”作为输出。

最后我用egrep（同样的正则表达式grep -e或-G没有给我同样的egrep结果）和-o

所以，我认为这可能类似于（我不是正则表达式大师）：

egrep -o "the*|this{1}|thoroughly{1}" filename

Answer 12

您可以将grep输出传输到Perl中，如下所示：

grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'

Answer 13

$ grep -w

摘自grep man page：

-w：仅选择包含构成整个单词的匹配项的行。测试是匹配的子字符串必须位于行的开头，或者前面是非单词构成字符。

Answer 14

`ripgrep`

以下是使用ripgrep的示例：

rg -o "(\w+)?th(\w+)?"

它将匹配所有与th匹配的单词。

grep只能显示与搜索模式匹配的单词吗？

14 个答案:

`ripgrep`