如果可能的话,我希望得到一些正规的表达指导,因为我对他们很垃圾:(
我已经扫描了彩票到文本,我正在尝试从返回的文本中抽出彩票号码。
这是返回的字符串:
"if * it •
Including Millionaire Raffle
7618-011874089-204279 111111111111111111111111111111
Goad luck for your draw on Fri 09 Nov 12
Your numbers
Lucky Stars
A 1 8 22 37 47 48 - 03 10
B11 15 26 43 44 - 05 06
C 08 23 27 28 29 - 02 09
D06 09 21 26 29 - 01 05
E 06 07 21 22 45 - 04 05
Your raffle numbers) for your draw(s)
PRC690104
PRC690105
PRC690106
PRC690107
1DRC690108
CHECK YOUR MILLIONAIRE RAFFLE
RESULTS ONLINE AT
WWW.NATIONAL-LOTTERY.CO.UK
5 plays x f2.00 for 1 draw = f10.00
HUGE EUROMILLIONS JACKPOTS TO
PLAY FOR EVERY TUESDAY AND
FRIDAY! PLAY TODAY FOR THE
CHANCE TO WIN YOUR WILDEST
DREAMS!
7618-011874089-204279 035469 Term. 26048301
Fill the box to void the ticket
11111111111111111111111 1111111111111111111111111"
这是扫描的图像:
正如您所看到的,彩票号码似乎总是出现在“幸运星”和“你的抽奖”之间
有人可以建议如何删除结果,所以我得到“A18223747480310”,“B11152643440506”,“C08232728290209”,“D06092126290105”,“E06072122450405”?
非常感谢任何帮助!
答案 0 :(得分:1)
Regex
和string.Split
的组合会更简单,更有效:
Regex reg = new Regex("(?s)(?<=Lucky Stars).+?(?=Your raffle numbers)");
string[] yourNumbers = Regex.Replace(reg.Match("inputString").Value,"[ -]", "")
.Split(new char[]{'\n'}, StringSplitOptions.RemoveEmptyEntries);
答案 1 :(得分:1)
让我们试着让事情变得简单:每个彩票号码都包含一个字母A
到E
,后面跟着14个数字,每个数字可能有多个空格和/或连字符( - )介于两者之间。
所以这是一个提取每个彩票号码的正则表达式:
[A-E]([\s-]*\d){14}
可视化:(来自Debuggex demo)
然后通过用空字符串替换所有空格和短划线来获得所需的结果。
答案 2 :(得分:0)
由于结果前导0(例如08为8),因此简单的方法将每2位数分割一次。不需要正则表达式。
答案 3 :(得分:0)
这对正则表达式适用于您向我们展示的案例。
/// <summary>
/// Regular expression built for C# on: Sun, Aug 25, 2013, 12:55:52 PM
/// Using Expresso Version: 3.0.4334, http://www.ultrapico.com
///
/// A description of the regular expression:
///
/// Match expression but don't capture it. [Lucky Stars\r\n]
/// Lucky Stars\r\n
/// Lucky
/// Space
/// Stars
/// Carriage return
/// New line
/// [Numbers]: A named capture group. [.*\r\n], exactly 5 repetitions
/// .*\r\n
/// Any character, any number of repetitions
/// Carriage return
/// New line
///
///
/// </summary>
public static Regex regex = new Regex(
"(?:Lucky Stars\\r\\n)(?<Numbers>.*\\r\\n){5}",
RegexOptions.CultureInvariant
| RegexOptions.Compiled
);
public static Regex replaceRegex = new Regex(
"(\\s-.*\r\n)",
RegexOptions.CultureInvariant
| RegexOptions.Compiled
);
数字检索的代码如下:
var InputText = @"Lucky Stars
A 1 8 22 37 47 48 - 03 10
B11 15 26 43 44 - 05 06
C 08 23 27 28 29 - 02 09
D06 09 21 26 29 - 01 05
E 06 07 21 22 45 - 04 05
Your raffle numbers";
Match m = regex.Match(InputText);
var numbers = m.Groups["Numbers"].Captures
.OfType<Capture>()
.Select(c => replaceRegex.Replace(c.Value, "").Replace(" ", ""));
但是我怀疑使用正则表达式是最好的解决方案,以防你使用OCR技术从图片中获取文本。