Question

这是您的RegEx向导的快捷方式。我需要一个能找到单词组的正则表达式。任何一组词。例如，我希望它能找到任何句子中的前两个单词。

示例“你好，你好吗？” - 回归将是“你好”

示例“你好吗？” - 返回将是“如何”

Answer 1

试试这个：

^\w+\s+\w+

说明：一个或多个单词字符，空格和一个或多个单词字符在一起。

Answer 2

正则表达式可以用于解析语言。正则表达式是一种更自然的工具。收集完这些单词后，使用字典查看它们是否真的是特定语言的单词。

前提是定义一个正则表达式，它将分割出％99.9个可能的单词，单词是一个关键定义。

我认为C＃将使用基于5.8 Perl的PCRE 这是我的ascii定义如何拆分单词（扩展）：

regex = '[\s[:punct:]]* (\w (?: \w | [[:punct:]](?=[\w[:punct:]]) )* )

和unicode（更多必须添加/减去特定编码的套件）：

regex = '[\s\pP]* ([\pL\pN_-] (?: [\pL\pN_-] | \pP(?=[\pL\pN\pP_-]) )* )'

要查找单词的 ALL ，请将正则表达式字符串转换为正则表达式（我不知道c＃）：

@matches =~ /$regex/xg

其中/ xg是扩展和全局修饰符。请注意，正则表达式字符串中只有捕获组1，因此不会捕获介入文本。

只找到 FIRST TWO ：

@matches =~ /(?:$regex)(?:$regex)/x

以下是Perl示例。无论如何，玩弄它。干杯！

use strict;
use warnings;

binmode (STDOUT,':utf8');

# Unicode
my $regex = qr/ [\s\pP]* ([\pL\pN_-] (?: [\pL\pN_-] | \pP(?=[\pL\pN\pP_-]) )* ) /x;

# Ascii
# my $regex = qr/ [\s[:punct:]]* (\w (?: \w | [[:punct:]](?=[\w[:punct:]]) )* ) /x;


my $text = q(
  I confirm that sufficient information and detail have been
  reported in this technical report, that it's "scientifically" sound,
  and that appropriate conclusion's have been included
);
print "\n**\n$text\n"; 

my @matches = $text =~ /$regex/g;
print "\nTotal ".scalar(@matches)." words\n",'-'x20,"\n";
for (@matches) {
    print "$_\n";
}

# =======================================

my $junk = q(
Hi, there, A écafé and Horse d'oeuvre 
hasn't? 'n? '? a-b? -'a-? 
);
print "\n\n**\n$junk\n"; 

# First 2 words
@matches = $junk =~ /(?:$regex)(?:$regex)/;
print "\nFirst 2 words\n",'-'x20,"\n";
for (@matches) {
    print "$_\n";
}

# All words
@matches = $junk =~ /$regex/g;
print "\nTotal ".scalar(@matches)." words\n",'-'x20,"\n";
for (@matches) {
    print "$_\n";
}

输出：
**

I confirm that sufficient information and detail have been
reported in this technical report, that it's "scientifically" sound,
and that appropriate conclusion's have been included

Total 25 words
--------------------
I
confirm
that
sufficient
information
and
detail
have
been
reported
in
this
technical
report
that
it's
scientifically
sound
and
that
appropriate
conclusion's
have
been
included

**

Hi, there, A écafé and Horse d'oeuvre
hasn't? 'n? '? a-b? -'a-?

First 2 words
--------------------
Hi
there

Total 11 words
--------------------
Hi
there
A
écafé
and
Horse
d'oeuvre
hasn't
n
a-b
a-

Answer 3

@ Rubens Farias ：

根据我的评论，这是我使用的代码：

public int startAt = 0;

private void btnGrabWordPairs_Click(object sender, EventArgs e)
    {
        Regex regex = new Regex(@"\b\w+\s+\w+\b"); //Start at word boundary, find one or more word chars, one or more whitespaces, one or more chars, end at word boundary

        if (startAt <= txtTest.Text.Length)
        {
            string match = regex.Match(txtArticle.Text, startAt).ToString();
            MessageBox.Show(match);
            startAt += match.Length; //update the starting position to the end of the last match
        }
     {

每次单击按钮时，它都会非常好地抓取单词对，继续执行txtTest TextBox中的文本，然后按顺序查找对，直到到达字符串末尾。

@ sln ：感谢非常详细的回复！

正则表达式找到单独的单词？

3 个答案: