Question

当我在终端呼叫它时它完美无缺！ Node Traduction = document.getChildNodes().item(0); NodeList traductionChildNodes = Traduction.getChildNodes(); Node Sortie = null; for (int i = 0; i < traductionChildNodes.getLength(); i++) { Node node = traductionChildNodes.item(i); // here we check the node name if ("Sortie".equals(node.getNodeName())) { Sortie = node; break; } } NodeList sortieChildNodes = Sortie.getChildNodes(); // we got the texts in an array so we can access them one after another String[] texts = new String[] {"AAA", "001", "002", "BBB"}; // i is for the nodes, j is for the for (int nodeIndex = 0, textIndex = 0; nodeIndex < sortieChildNodes.getLength(); nodeIndex++) { Node node = sortieChildNodes.item(nodeIndex); // here we check the node type if (node.getNodeType() == Node.ELEMENT_NODE) { node.setTextContent(texts[textIndex++]); } }

但我正试图让它适用于tika

tesseract 1.jpg outPutFileHere -l fra 与相同的文字图像我没有tika的结果:( 你知道发生了什么吗？

谢谢

Answer 1

例如，您需要提供名为“ X-Tika-OCRLanguage”的标头：

headers = {
    "X-Tika-OCRLanguage": "eng+nor"
}
parsed = parser.from_file(path, headers=headers)

如何将Tika python与Tesseract OCR捆绑在一起？

1 个答案: