当我在终端呼叫它时它完美无缺!
Node Traduction = document.getChildNodes().item(0);
NodeList traductionChildNodes = Traduction.getChildNodes();
Node Sortie = null;
for (int i = 0; i < traductionChildNodes.getLength(); i++) {
Node node = traductionChildNodes.item(i);
// here we check the node name
if ("Sortie".equals(node.getNodeName())) {
Sortie = node;
break;
}
}
NodeList sortieChildNodes = Sortie.getChildNodes();
// we got the texts in an array so we can access them one after another
String[] texts = new String[] {"AAA", "001", "002", "BBB"};
// i is for the nodes, j is for the
for (int nodeIndex = 0, textIndex = 0; nodeIndex < sortieChildNodes.getLength(); nodeIndex++) {
Node node = sortieChildNodes.item(nodeIndex);
// here we check the node type
if (node.getNodeType() == Node.ELEMENT_NODE) {
node.setTextContent(texts[textIndex++]);
}
}
但我正试图让它适用于tika
tesseract 1.jpg outPutFileHere -l fra
与相同的文字图像我没有tika的结果:(
你知道发生了什么吗?
谢谢
答案 0 :(得分:0)
例如,您需要提供名为“ X-Tika-OCRLanguage”的标头:
headers = {
"X-Tika-OCRLanguage": "eng+nor"
}
parsed = parser.from_file(path, headers=headers)