有条不紊地忽略了最后一个字

时间:2012-01-10 06:39:38

标签: c# regex

有人可以帮忙吗? (也发布在RegexBuddy论坛上)

我有这个相对较大(自动生成)的正则表达式(在底部完整列出),并且使用此片段有许多重复的片段: -

# Add words to word list
(?<_KC1>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

这是为了舀取&#39;更知名的片段之间的文字和文字。这些捕获都会在代码中汇总,以提供整体匹配中的单词列表。

我遇到的问题是第一个备用部分,即:

    # Pair of Strike prices
    (?<Strike>[+|-]?\d+(?:\.\d+)?)/(?<Strike2>[+|-]?\d+(?:\.\d+)?)

    # Add to Word List (but not 'x' as last word) !!!!!!!!!!!! This is what needs changing
    (?<_KC3>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

    # Cross price
    (?:x[ \t]?-?(?<Cross>[+|-]?\d+(?:\.\d+)?)x?)?

正如你所看到的那样,&#34;交叉价格&#34;总是以&#39; x&#39;开头,所以我需要的是一个与我提到的第一个片段尽可能相似的模式,但忽略了最后一个字,如果碰巧是&#39; x&#39;。 还有两个并发症: 1)&#34;交叉价格&#34;本身是可选的 2)&#39; x&#39;本身可以匹配&#34;期货到期日&#34;作为路透社的日期代码。

我尝试过负面的看守等等,但无论我做什么,我都会把别的东西弄乱。我相信答案可能在于If-Then-Else条件,但我不确定。

举个例子: -

WTI AMERICAN:Jun12 110.00 / 140.00 [1x2]来电差价x 102.50 350 - 365

&#34;一对罢工价格&#34;正在返回&#34; 110.00 / 140.00 &#34;正如预期的那样

但是Word List正在提取&#34; [1x2]调用点差x &#34; &#34; 102.50 &#34;应该是&#34;交叉价格&#34;现在正在表达式中稍后匹配为&#34; Bid&#34; “买入/卖出价差”的一部分&#34;。

感谢任何帮助

干杯 西蒙

# Match this group (optional)
(?:

    # Match one of the product symbols or their aliases
    \b(?<ProductSymbol>CL|Brent|GasOil|WTI|LO|BRT)\b

    # Add words to word list
    (?<_KC1>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

    # Skip over whitespace plus any of these characters [:]
    [ \t:]+
)?

# Futures expiry date
(?<=[ \t]|'|^)(?<FuturesExpiryPeriod>(?<_MY>(?<_MYP>(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?))[ \t]?(?<_MYY>(?:20)?\d\d))|(?<_CE>Cal-?(?<_CEY>(?:20)?\d\d))|(?<_QF>Q(?:uarter)?(?<_QFP>1|2|3|4)[ \t]*(?<_QFY>(?:20)?\d\d))|(?<_QL>(?<_QLP>1|2|3|4)[ \t]*Q(?:uarter)?[ \t]*(?<_QLY>(?:20)?\d\d))|(?<_HY>(?<_HYP>1|2)[ \t]*H(?:alf)?[ \t]*(?<_HYY>(?:20)?\d\d))|(?<_ER>(?<_ERP>[FGHJKMNQUVXZ])(?<_ERY>\d{0,2}))[ \t]*)

# Skip over whitespace
[ \t]+

# Add words to word list
(?<_KC2>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

# Match one of the following choices (in order):
(?:
    (?: # First choice

        # Pair of Strike prices
        (?<Strike>[+|-]?\d+(?:\.\d+)?)/(?<Strike2>[+|-]?\d+(?:\.\d+)?)

        # Add to Word List (but not 'x' as last word) !!!!!!!!!!!! This is what needs changing
        (?<_KC3>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

        # Cross price
        (?:x[ \t]?-?(?<Cross>[+|-]?\d+(?:\.\d+)?)x?)?
    )
    |
    (?: # Second choice

        # Cross price
        (?:x[ \t]?-?(?<Cross>[+|-]?\d+(?:\.\d+)?)x?)

        # Add words to word list
        (?<_KC4>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

        # Pair of Strike prices
        (?<Strike>[+|-]?\d+(?:\.\d+)?)/(?<Strike2>[+|-]?\d+(?:\.\d+)?)?
    )
    |
    (?: # Third choice

        # Single Strike price
        (?<Strike>[+|-]?\d+(?:\.\d+)?)

        # Add to Word List (but not 'x' as last word) !!!!!!!!!!!! This is what needs changing
        (?<_KC5>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

        # Cross price
        (?:x[ \t]?-?(?<Cross>[+|-]?\d+(?:\.\d+)?)x?)?
    )
    |
    (?: # Fourth choice

        # Cross price
        (?:x[ \t]?-?(?<Cross>[+|-]?\d+(?:\.\d+)?)x?)

        # Add words to word list
        (?<_KC6>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

        # Single Strike price
        (?<Strike>[+|-]?\d+(?:\.\d+)?)?
    )
)

# Add words to word list
(?<_KC7>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

# Skip over whitespace plus any of these characters [,]
[ \t,]+

# Bid/Offer spread
(?<Bid>[+|-]?\d+(?:\.\d+)?)[ \t]*(?:/|-|\ )[ \t]*(?<Offer>[+|-]?\d+(?:\.\d+)?)

# Look for any other keywords in brackets (optional)
(?:

    # Skip over whitespace
    [ \t]*

    # <pattern>
    \(

    # Add words to word list
    (?<_KC8>(?:(?:\w|[ \t\\/]|\[\w*\])*?))

    # <pattern>
    \)
)?

1 个答案:

答案 0 :(得分:0)

如果您要从文件或其他内容中读取内容,请更好地使用awk等工具进行解析。不要选择复杂的正则表达式程序,因为它们可能会在一些不太预期的场景中引起问题。 干杯!