正则表达式 - 查找彩票号码

时间:2013-08-25 09:23:55

标签: c# regex visual-studio

如果可能的话,我希望得到一些正规的表达指导,因为我对他们很垃圾:(

我已经扫描了彩票到文本,我正在尝试从返回的文本中抽出彩票号码。

这是返回的字符串:

"if * it • 
Including Millionaire Raffle
7618-011874089-204279   111111111111111111111111111111
Goad luck for your draw on Fri 09 Nov 12
Your numbers
Lucky Stars
A 1 8 22 37 47 48 - 03 10
B11 15 26 43 44 - 05 06
C 08 23 27 28 29 - 02 09
D06 09 21 26 29 - 01 05
E 06 07 21 22 45 - 04 05
Your raffle numbers) for your draw(s)
PRC690104 
PRC690105 
PRC690106 
PRC690107 
1DRC690108
CHECK YOUR MILLIONAIRE RAFFLE 
RESULTS ONLINE AT 
WWW.NATIONAL-LOTTERY.CO.UK
5 plays x f2.00 for 1 draw = f10.00
HUGE EUROMILLIONS JACKPOTS TO
PLAY FOR EVERY TUESDAY AND
FRIDAY! PLAY TODAY FOR THE
CHANCE TO WIN YOUR WILDEST
DREAMS!
7618-011874089-204279 035469 Term. 26048301
Fill the box to void the ticket
11111111111111111111111 1111111111111111111111111"

这是扫描的图像:

The ticket that was scanned

正如您所看到的,彩票号码似乎总是出现在“幸运星”和“你的抽奖”之间

有人可以建议如何删除结果,所以我得到“A18223747480310”,“B11152643440506”,“C08232728290209”,“D06092126290105”,“E06072122450405”?

非常感谢任何帮助!

4 个答案:

答案 0 :(得分:1)

Regexstring.Split的组合会更简单,更有效:

Regex reg = new Regex("(?s)(?<=Lucky Stars).+?(?=Your raffle numbers)");
string[] yourNumbers = Regex.Replace(reg.Match("inputString").Value,"[ -]", "")
                            .Split(new char[]{'\n'}, StringSplitOptions.RemoveEmptyEntries);

答案 1 :(得分:1)

让我们试着让事情变得简单:每个彩票号码都包含一个字母AE,后面跟着14个数字,每个数字可能有多个空格和/或连字符( - )介于两者之间。

所以这是一个提取每个彩票号码的正则表达式:

[A-E]([\s-]*\d){14}

可视化:(来自Debuggex demo

Regular expression visualization

然后通过用空字符串替换所有空格和短划线来获得所需的结果。

答案 2 :(得分:0)

由于结果前导0(例如08为8),因此简单的方法将每2位数分割一次。不需要正则表达式。

答案 3 :(得分:0)

这对正则表达式适用于您向我们展示的案例。

/// <summary>
///  Regular expression built for C# on: Sun, Aug 25, 2013, 12:55:52 PM
///  Using Expresso Version: 3.0.4334, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  Match expression but don't capture it. [Lucky Stars\r\n]
///      Lucky Stars\r\n
///          Lucky
///          Space
///          Stars
///          Carriage return
///          New line
///  [Numbers]: A named capture group. [.*\r\n], exactly 5 repetitions
///      .*\r\n
///          Any character, any number of repetitions
///          Carriage return
///          New line
///  
///
/// </summary>
public static Regex regex = new Regex(
      "(?:Lucky Stars\\r\\n)(?<Numbers>.*\\r\\n){5}",
    RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );


public static Regex replaceRegex = new Regex(
      "(\\s-.*\r\n)",
    RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );

数字检索的代码如下:

var InputText = @"Lucky Stars
A 1 8 22 37 47 48 - 03 10
B11 15 26 43 44 - 05 06
C 08 23 27 28 29 - 02 09
D06 09 21 26 29 - 01 05
E 06 07 21 22 45 - 04 05
Your raffle numbers";

Match m = regex.Match(InputText);
var numbers = m.Groups["Numbers"].Captures
    .OfType<Capture>()
    .Select(c => replaceRegex.Replace(c.Value, "").Replace(" ", ""));

但是我怀疑使用正则表达式是最好的解决方案,以防你使用OCR技术从图片中获取文本。