这是您的RegEx向导的快捷方式。我需要一个能找到单词组的正则表达式。任何一组词。例如,我希望它能找到任何句子中的前两个单词。
示例“你好,你好吗?” - 回归将是“你好”
示例“你好吗?” - 返回将是“如何”
答案 0 :(得分:4)
试试这个:
^\w+\s+\w+
说明:一个或多个单词字符,空格和一个或多个单词字符在一起。
答案 1 :(得分:2)
正则表达式可以用于解析语言。正则表达式是一种更自然的工具。收集完这些单词后,使用字典查看它们是否真的是特定语言的单词。
前提是定义一个正则表达式,它将分割出%99.9个可能的单词,单词是一个关键定义。
我认为C#将使用基于5.8 Perl的PCRE 这是我的ascii定义如何拆分单词(扩展):
regex = '[\s[:punct:]]* (\w (?: \w | [[:punct:]](?=[\w[:punct:]]) )* )
和unicode(更多必须添加/减去特定编码的套件):
regex = '[\s\pP]* ([\pL\pN_-] (?: [\pL\pN_-] | \pP(?=[\pL\pN\pP_-]) )* )'
要查找单词的 ALL ,请将正则表达式字符串转换为正则表达式(我不知道c#):
@matches =~ /$regex/xg
其中/ xg是扩展和全局修饰符。请注意,正则表达式字符串中只有捕获组1,因此不会捕获介入文本。
只找到 FIRST TWO :
@matches =~ /(?:$regex)(?:$regex)/x
以下是Perl示例。无论如何,玩弄它。干杯!
use strict;
use warnings;
binmode (STDOUT,':utf8');
# Unicode
my $regex = qr/ [\s\pP]* ([\pL\pN_-] (?: [\pL\pN_-] | \pP(?=[\pL\pN\pP_-]) )* ) /x;
# Ascii
# my $regex = qr/ [\s[:punct:]]* (\w (?: \w | [[:punct:]](?=[\w[:punct:]]) )* ) /x;
my $text = q(
I confirm that sufficient information and detail have been
reported in this technical report, that it's "scientifically" sound,
and that appropriate conclusion's have been included
);
print "\n**\n$text\n";
my @matches = $text =~ /$regex/g;
print "\nTotal ".scalar(@matches)." words\n",'-'x20,"\n";
for (@matches) {
print "$_\n";
}
# =======================================
my $junk = q(
Hi, there, A écafé and Horse d'oeuvre
hasn't? 'n? '? a-b? -'a-?
);
print "\n\n**\n$junk\n";
# First 2 words
@matches = $junk =~ /(?:$regex)(?:$regex)/;
print "\nFirst 2 words\n",'-'x20,"\n";
for (@matches) {
print "$_\n";
}
# All words
@matches = $junk =~ /$regex/g;
print "\nTotal ".scalar(@matches)." words\n",'-'x20,"\n";
for (@matches) {
print "$_\n";
}
输出:
**
I confirm that sufficient information and detail have been
reported in this technical report, that it's "scientifically" sound,
and that appropriate conclusion's have been included
Total 25 words
--------------------
I
confirm
that
sufficient
information
and
detail
have
been
reported
in
this
technical
report
that
it's
scientifically
sound
and
that
appropriate
conclusion's
have
been
included
**
Hi, there, A écafé and Horse d'oeuvre
hasn't? 'n? '? a-b? -'a-?
First 2 words
--------------------
Hi
there
Total 11 words
--------------------
Hi
there
A
écafé
and
Horse
d'oeuvre
hasn't
n
a-b
a-
答案 2 :(得分:0)
@ Rubens Farias :
根据我的评论,这是我使用的代码:
public int startAt = 0;
private void btnGrabWordPairs_Click(object sender, EventArgs e)
{
Regex regex = new Regex(@"\b\w+\s+\w+\b"); //Start at word boundary, find one or more word chars, one or more whitespaces, one or more chars, end at word boundary
if (startAt <= txtTest.Text.Length)
{
string match = regex.Match(txtArticle.Text, startAt).ToString();
MessageBox.Show(match);
startAt += match.Length; //update the starting position to the end of the last match
}
{
每次单击按钮时,它都会非常好地抓取单词对,继续执行txtTest TextBox中的文本,然后按顺序查找对,直到到达字符串末尾。
@ sln :感谢非常详细的回复!