如何从大OCR多线串中分离字段?

时间:2016-10-22 05:10:16

标签: java ocr

我正在开发基于OCR的Android应用程序,动态地将此文本作为字符串从图像中获取(从图像中获取水平方向的文本)

图片文字:

"零件成本发动机油和油过滤器更换Rs 10000空气过滤器Rs 45000客舱AC微过滤器Rs 40000花粉过滤器Rs 12000 AC消毒剂Rs 30000燃油滤清器Rs 60000火花塞套装更换Rs 10000沐浴露,基本清洁8,发动机脱脂F2s 30000身体蜡波兰详细Rs 70000汽车内饰干洗与胚芽清洁Rs 80000车轮定位8.平衡Rs 60000刹车片更换(对)Rs 30000制动盘更换(对)Rs 30000电动可折叠\加热侧ORVM替换Rs 40000电池更换Rs 25000前减震器对组件(左右两个)Rs 60000大灯总成(氙气灯)Rs 15000铝合金轮组(16英寸 - 17英寸) - 4件套合金Rs 12000燃料喷油器更换Rs 12000燃油组件(F \ u0027ump +喷油器+燃油单元+分配器)Rs 30000保险杠更换Rs 60000阀盖更换Rs 10000中冷器更换Rs 40000 AC压缩机组件更换Rs 20000 AC冷凝器,Radia tor替换Rs 10000工作像凹痕去除与轻微划痕修复工作与油漆Rs 18000挡风玻璃更换Rs 35000悬架大修(复古套件的悬架包括下臂,20000传输系统故障 - 更换(极端罕见情况)卢比70000总计50卢比, 00000 00"。

example : 
Engine Oil and Oil Filter Replacement Rs 10000
key = Engine Oil and OH Filter Replacement
value = 10000

我需要分离部件和成本(每个2列)从中获取值并将其存储在SQLIte数据库Android中。我不知道如何获取值并将它们分开。

2 个答案:

答案 0 :(得分:1)

Android_Dev的解决方案非常复杂。 (抱歉老兄)

此代码:

ocrText = ocrText.replaceAll(" F2s "," Rs "); // Error in OCR
java.util.regex.Pattern lines = java.util.regex.Pattern.compile("(.*?) Rs (\\d+) *");
java.util.regex.Matcher matchLines = lines.matcher(ocrText);
while (matchLines.find()) {
    System.out.println("\nkey = " + matchLines.group(1));
    System.out.println("value = " + matchLines.group(2));
}

正在做你想做的事并打印:

key = Parts Cost Engine Oil and Oil Filter Replacement
value = 10000

key = Air Filter
value = 45000

(...)

key = Windshield Replacement
value = 35000

key = Suspension Overhaul (Retro Kit of Suspension including Lower Arm,
value = 20000

key = Transmission System Failure - replacement (extreme rare cases)
value = 70000

key = TOTAL
value = 50

(请下次,请提及'Rs'分隔符。我们怎么猜这个?)

答案 1 :(得分:0)

在您的情况下,没有标准格式的数据(我们不能指望来自OCR库)您编写自己的自定义解析器。您可以使用波纹管功能来解析数据。

public static void parseResponse(String responseData)
{
    String SEPARATER = "Rs";
    String SPACE = " ";

    if(data != null && data.length() != 0)
    {
        int startIndex = 0;
        int endIndex = 0;
        Map<String,String> keyValueMap = new HashMap<>();

        while(endIndex < data.length())
        {
            endIndex = data.indexOf(SEPARATER, startIndex);

            if(endIndex == -1) // Break loop if Separator not found
                break;

            String key = data.substring(startIndex, endIndex);

            Log.d("", " Key = " + key);

            startIndex = endIndex + SEPARATER.length() + 1; // pluse one for Space character

            endIndex = data.indexOf(" ",startIndex);

            String value = "";

            if(endIndex == -1)
            {
                value = data.substring(startIndex,data.length()-1); // End of the String
                endIndex = data.length();
            }else
            {
                value = data.substring(startIndex,endIndex);
                startIndex = endIndex;
            }


            Log.d("", " Value = " + value);

            keyValueMap.put(key, value);

            startIndex = endIndex;

        }
    }

}