Question

如果可能的话，我希望得到一些正规的表达指导，因为我对他们很垃圾:(

我已经扫描了彩票到文本，我正在尝试从返回的文本中抽出彩票号码。

这是返回的字符串：

"if * it • 
Including Millionaire Raffle
7618-011874089-204279   111111111111111111111111111111
Goad luck for your draw on Fri 09 Nov 12
Your numbers
Lucky Stars
A 1 8 22 37 47 48 - 03 10
B11 15 26 43 44 - 05 06
C 08 23 27 28 29 - 02 09
D06 09 21 26 29 - 01 05
E 06 07 21 22 45 - 04 05
Your raffle numbers) for your draw(s)
PRC690104 
PRC690105 
PRC690106 
PRC690107 
1DRC690108
CHECK YOUR MILLIONAIRE RAFFLE 
RESULTS ONLINE AT 
WWW.NATIONAL-LOTTERY.CO.UK
5 plays x f2.00 for 1 draw = f10.00
HUGE EUROMILLIONS JACKPOTS TO
PLAY FOR EVERY TUESDAY AND
FRIDAY! PLAY TODAY FOR THE
CHANCE TO WIN YOUR WILDEST
DREAMS!
7618-011874089-204279 035469 Term. 26048301
Fill the box to void the ticket
11111111111111111111111 1111111111111111111111111"

这是扫描的图像：

The ticket that was scanned

正如您所看到的，彩票号码似乎总是出现在“幸运星”和“你的抽奖”之间

有人可以建议如何删除结果，所以我得到“A18223747480310”，“B11152643440506”，“C08232728290209”，“D06092126290105”，“E06072122450405”？

非常感谢任何帮助！

Answer 1

Regex和string.Split的组合会更简单，更有效：

Regex reg = new Regex("(?s)(?<=Lucky Stars).+?(?=Your raffle numbers)");
string[] yourNumbers = Regex.Replace(reg.Match("inputString").Value,"[ -]", "")
                            .Split(new char[]{'\n'}, StringSplitOptions.RemoveEmptyEntries);

Answer 2

让我们试着让事情变得简单：每个彩票号码都包含一个字母A到E，后面跟着14个数字，每个数字可能有多个空格和/或连字符（ - ）介于两者之间。

所以这是一个提取每个彩票号码的正则表达式：

[A-E]([\s-]*\d){14}

可视化：（来自Debuggex demo）

Regular expression visualization

然后通过用空字符串替换所有空格和短划线来获得所需的结果。

Answer 3

由于结果前导0（例如08为8），因此简单的方法将每2位数分割一次。不需要正则表达式。

Answer 4

这对正则表达式适用于您向我们展示的案例。

/// <summary>
///  Regular expression built for C# on: Sun, Aug 25, 2013, 12:55:52 PM
///  Using Expresso Version: 3.0.4334, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  Match expression but don't capture it. [Lucky Stars\r\n]
///      Lucky Stars\r\n
///          Lucky
///          Space
///          Stars
///          Carriage return
///          New line
///  [Numbers]: A named capture group. [.*\r\n], exactly 5 repetitions
///      .*\r\n
///          Any character, any number of repetitions
///          Carriage return
///          New line
///  
///
/// </summary>
public static Regex regex = new Regex(
      "(?:Lucky Stars\\r\\n)(?<Numbers>.*\\r\\n){5}",
    RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );


public static Regex replaceRegex = new Regex(
      "(\\s-.*\r\n)",
    RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );

数字检索的代码如下：

var InputText = @"Lucky Stars
A 1 8 22 37 47 48 - 03 10
B11 15 26 43 44 - 05 06
C 08 23 27 28 29 - 02 09
D06 09 21 26 29 - 01 05
E 06 07 21 22 45 - 04 05
Your raffle numbers";

Match m = regex.Match(InputText);
var numbers = m.Groups["Numbers"].Captures
    .OfType<Capture>()
    .Select(c => replaceRegex.Replace(c.Value, "").Replace(" ", ""));

但是我怀疑使用正则表达式是最好的解决方案，以防你使用OCR技术从图片中获取文本。

正则表达式 - 查找彩票号码

4 个答案: