从扫描数据中提取数据

时间:2018-04-04 02:09:09

标签: php regex neural-network

我从扫描的收据中获取文本字符串。以下是几个例子:

  

George's Restaurant 300 72th Street Miami Beach fl 33141 305-864-5586服务器:Ronald 01/19/2013表20/1 10:53 PM嘉宾:1 10062转载#:1法拉利Carano Insalate Cesare Caprese与意大利熏火腿FISH SPEC菠菜意大利乳清干酪馄饨海鲜意大利面Ossobucco 47.00 7.50 9.50 25.95 15.95 19.95 29.95小计总税155.80 14.02总计169.82 169.82余额到期未包括在内!谢谢您的业务

如何确定每种情况下的总金额(169.82和52.88)?

我原以为我可以删除所有非数字字符,将剩余部分拆分为数组并寻找最大数字。但它可能会让地址和电话号码混乱。我想我需要确保单词TOTAL,SUB-TOTAL或AMOUNT DUE附近。

有什么建议吗?感谢。

另一个例子:

  

933 ece tur New OrlerS LA 70116 504.:25.1602 wwwfranksresta.ratnewor leans.com 219 KATHY U che 1750 Feb03' 1(7:-2PM Tbl 6/1 Gst 4 1 GARLICBREAD 2 Diet 2 Iced Tea 2 TASTE NO 1整个Muff 1 Alfredo 3,95 6.00 6.00 33.90 14.95 14.95食品税总计79.75 7.78 87.53

image here

更新:

看来我需要研究神经网络来解决这个问题。

1 个答案:

答案 0 :(得分:1)

试试这个:

<?php

function checktotal($rcpt) {
    if (preg_match_all('/(\d+\.\d{2})(?:\D|$)/', $rcpt, $match))
        echo 'Total is $' . max($match[1]) . "\n";
    else echo "No numbers!\n";
}

$rcpts = [
    "George's Restaurant 300 72th Street Miami Beach fl 33141 305-864-5586 Server: Ronald 01/19/2013 Table 20/1 10:53 PM Guests: 1 10062 Reprint #: 1 Ferrari Carano Insalate Cesare Caprese with prosciutto FISH SPEC Spinach Ricotta Ravioli Seafood Pasta Ossobucco 47.00 7.50 9.50 25.95 15.95 19.95 29.95 Sub Total Tax 155.80 14.02 Total 169.82 169.82 Balance Due GRATUITY NOT INCLUDED!!! Thank you for your business",
    "SUSHI HARA 8701 W PARMER LANE STE 2128 AUSTIN, TX 78729 123835218 ORDER: A9 Dine-in 25-Jan-2018 6 10 53 1 다tASHU DON SHRIMP TEMPURA (3PCS HARU COMBO SALMON ROLL $11.95 $8.95 $20.00 $7.95 to go Subtotal $48.85 $4.03 S52.88 Tax Total Order 05852ZSBGOW4M Thank you for dining at Sushi Hara",
    "933 ece tur New OrlerS LA 70116 504.:25.1602 wwwfranksresta.ratnewor leans.com 219 KATHY U che 1750 Feb03'1 (7:-2PM Tbl 6/1 Gst 4 1 GARLICBREAD 2 Diet 2 Iced Tea 2 TASTE OF NO 1 Whole Muff 1 Alfredo 3,95 6.00 6.00 33.90 14.95 14.95 Food Tax TOTAL DUE 79.75 7.78 87.53"
    ];
foreach ($rcpts as $rcpt) checktotal($rcpt);

测试组的输出是:

Total is $169.82
Total is $52.88
Total is $87.53