Question

我正在使用Microsoft Cognitive Services Computer Vision Api作为OCR服务来阅读食物菜单。

我能够成功扫描菜单，但现在我想将菜名部分和价格部分保存在两个不同的数组中。

扫描后的结果现在因为卢比符号，括号，菜单编号等特殊字符而完全被破坏。

我只想要没有卢比标志的菜名和价格。有人能告诉我怎么能实现这一目标？这是github link以及一些可以帮助我帮助我的代码：

@Override
    protected void onPostExecute(String data) {
        super.onPostExecute(data);
        // Display based on error existence

        if (e != null) {
            mEditText.setText("Error: " + e.getMessage());
            this.e = null;
        } else {
            Gson gson = new Gson();
            OCR r = gson.fromJson(data, OCR.class);

            String result = "";
            for (Region reg : r.regions) {
                for (Line line : reg.lines) {
                    for (Word word : line.words) {
                        result += word.text + " ";
                    }
                    result += "\n";
                }
                result += "\n\n";
            }

            mEditText.setText(result);
        }
        mButtonSelectImage.setEnabled(true);
    }

我想要的是：

1）我不希望在结果中出现任何这些特殊字符。

2）我想将菜肴名称和价格保存在两个不同的数组中。

Here are the screen shots of the output and the menu.

Answer 1

我找到了答案，谢谢你们的帮助！

我只使用这个正则表达式获得了字符：

resultString = result.replaceAll("\\P{L}", " ");

并且只使用此正则表达式的数字：

resultNumber = result.replaceAll("[^\\d.]", "");

如何区分扫描文本中的文字和数字？

1 个答案: