我正在使用AmazonTextract .NET SDK从图像中提取文本。其返回的块列表作为响应的一部分。我需要从提取的文本中提取键值对。我想我们需要遍历阻止列表,检查KEY_VALUE_SET
我的理解正确吗?有人可以给我一段代码,在提取文本后为我提供键值对。
我的示例代码:
var DocRequest = new AnalyzeDocumentRequest()
{
Document = MyDocument,
FeatureTypes = new List<string> { Amazon.Textract.FeatureType.FORMS, Amazon.Textract.FeatureType.TABLES }
};
var response = client.AnalyzeDocumentAsync(DocRequest);
答案 0 :(得分:1)
以下代码用于填充地图。 AnalyzeResult 类只是键和值映射以及页面列表的持有者。
List<Block> blocks = result.getBlocks();
for (Block block : blocks) {
String blockId = block.getId();
analyzeResult.blockMap.put(blockId, block);
String blockType = block.getBlockType();
switch(blockType){
case AppConstant.BLOCK_PAGE :
page = new ArrayList<TextLine>();
analyzeResult.pages.add(page);
break;
case AppConstant.BLOCK_KEY_VALUE_SET :
if (block.getEntityTypes().contains(AppConstant.BLOCK_KEY)) {
analyzeResult.keyMap.put(blockId, block);
}
else {
analyzeResult.valueMap.put(blockId, block);
}
break;
}
}
paginationToken = result.getNextToken();
if (paginationToken == null) {
finished = true;
}
}
以下函数可用于获取关系和查找键值对
public List getKVRelationShip(AnalyzeResult analyzeResult) {
List listOfFormFields = new ArrayList<FormInfo>();
final Set<Map.Entry<String, Block>> entries = analyzeResult.keyMap.entrySet();
for (Map.Entry<String, Block> entry : entries) {
Block keyBlock = entry.getValue();
Block valueBlock = this.findValueBlock(keyBlock, analyzeResult.valueMap);
if(valueBlock != null){
String key = getText(keyBlock, analyzeResult.blockMap);
String val = getText(valueBlock, analyzeResult.blockMap);
key = key != null ? key.trim() : "";
val = val != null ? val.trim() : "";
FormInfo formInfo = new FormInfo(key, val, keyBlock.getPage(),
keyBlock.getGeometry().getBoundingBox().getTop(), keyBlock.getConfidence());
listOfFormFields.add(formInfo);
}
}
Collections.sort(listOfFormFields);
return listOfFormFields;
}
public String getText(Block results, Map blockMap) {
String text = "";
if (results.getRelationships() != null && results.getRelationships().size() != 0) {
for (Relationship relationship : results.getRelationships()) {
if (relationship.getType().equals(AppConstant.BLOCK_CHILD)) {
for (String childId : relationship.getIds()) {
Block word = (Block) blockMap.get(childId);
if (word.getBlockType().equals(AppConstant.BLOCK_WORD)) {
text = text + word.getText() + " ";
}
if (word.getBlockType().equals(AppConstant.BLOCK_SELECTION_ELEMENT)) {
if (word.getSelectionStatus().equals(AppConstant.BLOCK_SELECTED)) {
text = text + "X";
}
}
}
}
}
}
return text;
}
private Block findValueBlock(Block block, Map valueMap) {
Block valueBlock = null;
for (Relationship relationship : block.getRelationships()) {
if (relationship.getType().equals(AppConstant.BLOCK_VALUE)) {
for (String valueId : relationship.getIds()) {
valueBlock = (Block) valueMap.get(valueId);
}
}
}
return valueBlock;
}
答案 1 :(得分:0)
AWS在其文档中提供了用于键值映射的python示例代码。它并不十分复杂。您可以尝试了解python代码背后的逻辑,然后在您的.NET项目中实现它。
以下是映射代码:https://docs.aws.amazon.com/textract/latest/dg/examples-extract-kvp.html