正则表达式匹配除两个字之外的两个字符串之间的任

时间:2014-04-07 07:49:35

标签: c# regex pdftotext

我有一个句子,在句子的起点和终点之间,它可以包括任何特殊字符或数字或字母,但不包括单词。

为了更清楚我的观点,我在下面举例说明:

我有"Today's Market value 0.5 percent"

这样的句子

现在从“市场价值”和“百分比”之间的上述句子中,我不能得到任何其他词语。

Statements allowed:
1) "Today's Market value*    0.5 percent"
2) "Today's Market value\1   0.5 percent"
3) "Today's Market value \1 0.5 percent"
4) "Today's Market value e   0.5 percent"
5) "Today's Market value 0.5 percent"

Statements not allowed:
1) "Today's market value is    0.5 percent"
2) "Today's market value  is 0.5 percent"

3) "Today's Market value is 0.5 percent"

我主要想在这里拿起市场价值,即“0.5”。

请建议我建立正则表达式以达到上述要求的正确方法。

2 个答案:

答案 0 :(得分:0)

试试这个正则表达式:

\bMarket value\b(?!\s+is\s)[\s\S]*?(\d+(?:\.\d+)?)\s*percent\b

(?!\s+is\s)为负面预测,检查is后面是否有Market value

Online Demo

答案 1 :(得分:0)

如果字符串没问题,这是提取感兴趣的数量的代码:

string[] strList = new[] {
    @"Today's Market value*    0.5 percent",
    @"Today's Market value\1   0.5 percent",
    @"Today's Market value \1 0.5 percent",
    @"Today's Market value e   0.5 percent",
    @"Today's Market value 0.5 percent",
    @"Today's market value is    0.5 percent",
    @"Today's market value  is 0.5 percent",
    @"Today's Market value is 0.5 percent"
};
foreach (string str in strList)
{
    Match m = Regex.Match(str, @"(?<=Market value.*\s)(?<!Market value.*[a-zA-Z]{2}.*)\d+(\.\d+)?(?=\s.*percent)(?!.*[a-zA-Z]{2}.*percent)", RegexOptions.Singleline);
    if (m.Success)
        Console.WriteLine("{0} : {1}", m.Value, str);
}

输出:

0.5 : Today's Market value*    0.5 percent
0.5 : Today's Market value\1   0.5 percent
0.5 : Today's Market value \1 0.5 percent
0.5 : Today's Market value e   0.5 percent
0.5 : Today's Market value 0.5 percent

基本理念:数字应以市场价值文字,任何空白开头,但不应以此为先市场价值+ 2个或更多连续字母文字。此外,数字后面应加上空白任何文字,但不应该跟 2或更多连续字母+百分比