我在90度旋转图像上尝试了Google Cloud Vision api(TEXT_DETECTION)。它仍然可以正确返回识别的文本。 (见下图)
这意味着即使图像旋转了90度,180度,270度,引擎也能识别文本。
然而,响应结果不包括正确图像方向的信息。 (文件:EntityAnnotation)
无论如何,不仅要获得认可的文字,还要获得方向?
谷歌可以支持它类似于(FaceAnnotation:getRollAngle)
答案 0 :(得分:4)
正如Public Issue Tracker中所述,我们的工程团队现在已经知道此功能请求,并且目前没有ETA可用于其实施。
请注意,图片的元数据中可能已经提供了方向信息。在Third-party library中可以看到如何提取元数据的示例。
一个广泛的解决方法是检查返回的“boundingPoly”“顶点”以查找返回的“textAnnotations”。通过计算每个检测到的单词矩形的宽度和高度,如果矩形“高度”>,您可以确定图像是不是正面朝上的。 'width'(又名图像是侧身)。
答案 1 :(得分:3)
您可以利用我们知道单词中字符序列的事实来推断单词的方向如下(对于非LTR语言,逻辑略有不同):
for page in annotation:
for block in page.blocks:
for paragraph in block.paragraphs:
for word in paragraph.words:
if len(word.symbols) < MIN_WORD_LENGTH_FOR_ROTATION_INFERENCE:
continue
first_char = word.symbols[0]
last_char = word.symbols[-1]
first_char_center = (np.mean([v.x for v in first_char.bounding_box.vertices]),np.mean([v.y for v in first_char.bounding_box.vertices]))
last_char_center = (np.mean([v.x for v in last_char.bounding_box.vertices]),np.mean([v.y for v in last_char.bounding_box.vertices]))
#upright or upside down
if np.abs(first_char_center[1] - last_char_center[1]) < np.abs(top_right.y - bottom_right.y):
if first_char_center[0] <= last_char_center[0]: #upright
print 0
else: #updside down
print 180
else: #sideways
if first_char_center[1] <= last_char_center[1]:
print 90
else:
print 270
然后,您可以使用单个单词的方向来推断整个文档的方向。
答案 2 :(得分:2)
我发布了我的解决方法,它真正适用于旋转90度,180度,270度的图像。请参阅下面的代码。
GetExifOrientation(annotateImageResponse.getTextAnnotations().get(1));
/**
*
* @param ea The input EntityAnnotation must be NOT from the first EntityAnnotation of
* annotateImageResponse.getTextAnnotations(), because it is not affected by
* image orientation.
* @return Exif orientation (1 or 3 or 6 or 8)
*/
public static int GetExifOrientation(EntityAnnotation ea) {
List<Vertex> vertexList = ea.getBoundingPoly().getVertices();
// Calculate the center
float centerX = 0, centerY = 0;
for (int i = 0; i < 4; i++) {
centerX += vertexList.get(i).getX();
centerY += vertexList.get(i).getY();
}
centerX /= 4;
centerY /= 4;
int x0 = vertexList.get(0).getX();
int y0 = vertexList.get(0).getY();
if (x0 < centerX) {
if (y0 < centerY) {
// 0 -------- 1
// | |
// 3 -------- 2
return EXIF_ORIENTATION_NORMAL; // 1
} else {
// 1 -------- 2
// | |
// 0 -------- 3
return EXIF_ORIENTATION_270_DEGREE; // 6
}
} else {
if (y0 < centerY) {
// 3 -------- 0
// | |
// 2 -------- 1
return EXIF_ORIENTATION_90_DEGREE; // 8
} else {
// 2 -------- 3
// | |
// 1 -------- 0
return EXIF_ORIENTATION_180_DEGREE; // 3
}
}
}
更多信息
我发现我必须添加语言提示,以使annotateImageResponse.getTextAnnotations().get(1)
始终遵循规则。
添加语言提示的示例代码
ImageContext imageContext = new ImageContext();
String [] languages = { "zh-TW" };
imageContext.setLanguageHints(Arrays.asList(languages));
annotateImageRequest.setImageContext(imageContext);
答案 3 :(得分:0)
有时无法从元数据中获取方向。例如,如果用户使用具有错误方向的移动设备的相机拍摄照片。 我的解决方案基于Jack Fan的回答以及 google-api-services-vision (可通过Maven获得)。
我的TextUnit类
public class TextUnit {
private String text;
// X of lowest left point
private float llx;
// Y of lowest left point
private float lly;
// X of upper right point
private float urx;
// Y of upper right point
private float ury;
}
基本方法:
List<TextUnit> extractData(BatchAnnotateImagesResponse response) throws AnnotateImageResponseException {
List<TextUnit> data = new ArrayList<>();
for (AnnotateImageResponse res : response.getResponses()) {
if (null != res.getError()) {
String errorMessage = res.getError().getMessage();
logger.log(Level.WARNING, "AnnotateImageResponse ERROR: " + errorMessage);
throw new AnnotateImageResponseException("AnnotateImageResponse ERROR: " + errorMessage);
} else {
List<EntityAnnotation> texts = response.getResponses().get(0).getTextAnnotations();
if (texts.size() > 0) {
//get orientation
EntityAnnotation first_word = texts.get(1);
int orientation;
try {
orientation = getExifOrientation(first_word);
} catch (NullPointerException e) {
try {
orientation = getExifOrientation(texts.get(2));
} catch (NullPointerException e1) {
orientation = EXIF_ORIENTATION_NORMAL;
}
}
logger.log(Level.INFO, "orientation: " + orientation);
// Calculate the center
float centerX = 0, centerY = 0;
for (Vertex vertex : first_word.getBoundingPoly().getVertices()) {
if (vertex.getX() != null) {
centerX += vertex.getX();
}
if (vertex.getY() != null) {
centerY += vertex.getY();
}
}
centerX /= 4;
centerY /= 4;
for (int i = 1; i < texts.size(); i++) {//exclude first text - it contains all text of the page
String blockText = texts.get(i).getDescription();
BoundingPoly poly = texts.get(i).getBoundingPoly();
try {
float llx = 0;
float lly = 0;
float urx = 0;
float ury = 0;
if (orientation == EXIF_ORIENTATION_NORMAL) {
poly = invertSymmetricallyBy0X(centerY, poly);
llx = getLlx(poly);
lly = getLly(poly);
urx = getUrx(poly);
ury = getUry(poly);
} else if (orientation == EXIF_ORIENTATION_90_DEGREE) {
//invert by x
poly = rotate(centerX, centerY, poly, Math.toRadians(-90));
poly = invertSymmetricallyBy0Y(centerX, poly);
llx = getLlx(poly);
lly = getLly(poly);
urx = getUrx(poly);
ury = getUry(poly);
} else if (orientation == EXIF_ORIENTATION_180_DEGREE) {
poly = rotate(centerX, centerY, poly, Math.toRadians(-180));
poly = invertSymmetricallyBy0Y(centerX, poly);
llx = getLlx(poly);
lly = getLly(poly);
urx = getUrx(poly);
ury = getUry(poly);
}else if (orientation == EXIF_ORIENTATION_270_DEGREE){
//invert by x
poly = rotate(centerX, centerY, poly, Math.toRadians(-270));
poly = invertSymmetricallyBy0Y(centerX, poly);
llx = getLlx(poly);
lly = getLly(poly);
urx = getUrx(poly);
ury = getUry(poly);
}
data.add(new TextUnit(blockText, llx, lly, urx, ury));
} catch (NullPointerException e) {
//ignore - some polys has not X or Y coordinate if text located closed to bounds.
}
}
}
}
}
return data;
}
辅助方法:
private float getLlx(BoundingPoly poly) {
try {
List<Vertex> vertices = poly.getVertices();
ArrayList<Float> xs = new ArrayList<>();
for (Vertex v : vertices) {
float x = 0;
if (v.getX() != null) {
x = v.getX();
}
xs.add(x);
}
Collections.sort(xs);
float llx = (xs.get(0) + xs.get(1)) / 2;
return llx;
} catch (Exception e) {
return 0;
}
}
private float getLly(BoundingPoly poly) {
try {
List<Vertex> vertices = poly.getVertices();
ArrayList<Float> ys = new ArrayList<>();
for (Vertex v : vertices) {
float y = 0;
if (v.getY() != null) {
y = v.getY();
}
ys.add(y);
}
Collections.sort(ys);
float lly = (ys.get(0) + ys.get(1)) / 2;
return lly;
} catch (Exception e) {
return 0;
}
}
private float getUrx(BoundingPoly poly) {
try {
List<Vertex> vertices = poly.getVertices();
ArrayList<Float> xs = new ArrayList<>();
for (Vertex v : vertices) {
float x = 0;
if (v.getX() != null) {
x = v.getX();
}
xs.add(x);
}
Collections.sort(xs);
float urx = (xs.get(xs.size()-1) + xs.get(xs.size()-2)) / 2;
return urx;
} catch (Exception e) {
return 0;
}
}
private float getUry(BoundingPoly poly) {
try {
List<Vertex> vertices = poly.getVertices();
ArrayList<Float> ys = new ArrayList<>();
for (Vertex v : vertices) {
float y = 0;
if (v.getY() != null) {
y = v.getY();
}
ys.add(y);
}
Collections.sort(ys);
float ury = (ys.get(ys.size()-1) +ys.get(ys.size()-2)) / 2;
return ury;
} catch (Exception e) {
return 0;
}
}
/**
* rotate rectangular clockwise
*
* @param poly
* @param theta the angle of rotation in radians
* @return
*/
public BoundingPoly rotate(float centerX, float centerY, BoundingPoly poly, double theta) {
List<Vertex> vertexList = poly.getVertices();
//rotate all vertices in poly
for (Vertex vertex : vertexList) {
float tempX = vertex.getX() - centerX;
float tempY = vertex.getY() - centerY;
// now apply rotation
float rotatedX = (float) (centerX - tempX * cos(theta) + tempY * sin(theta));
float rotatedY = (float) (centerX - tempX * sin(theta) - tempY * cos(theta));
vertex.setX((int) rotatedX);
vertex.setY((int) rotatedY);
}
return poly;
}
/**
* since Google Vision Api returns boundingPoly-s when Coordinates starts from top left corner,
* but Itext uses coordinate system with bottom left start position -
* we need invert the result for continue to work with itext.
*
* @return text units inverted symmetrically by 0X coordinates.
*/
private BoundingPoly invertSymmetricallyBy0X(float centerY, BoundingPoly poly) {
List<Vertex> vertices = poly.getVertices();
for (Vertex v : vertices) {
if (v.getY() != null) {
v.setY((int) (centerY + (centerY - v.getY())));
}
}
return poly;
}
/**
*
* @param centerX
* @param poly
* @return text units inverted symmetrically by 0Y coordinates.
*/
private BoundingPoly invertSymmetricallyBy0Y(float centerX, BoundingPoly poly) {
List<Vertex> vertices = poly.getVertices();
for (Vertex v : vertices) {
if (v.getX() != null) {
v.setX((int) (centerX + (centerX - v.getX())));
}
}
return poly;
}
答案 4 :(得分:0)
范杰的答案对我有用。这是我的VanillaJS版本。
/**
*
* @param gOCR The Google Vision response
* @return orientation (0, 90, 180 or 270)
*/
function getOrientation(gOCR) {
var vertexList = gOCR.responses[0].textAnnotations[1].boundingPoly.vertices;
const ORIENTATION_NORMAL = 0;
const ORIENTATION_270_DEGREE = 270;
const ORIENTATION_90_DEGREE = 90;
const ORIENTATION_180_DEGREE = 180;
var centerX = 0, centerY = 0;
for (var i = 0; i < 4; i++) {
centerX += vertexList[i].x;
centerY += vertexList[i].y;
}
centerX /= 4;
centerY /= 4;
var x0 = vertexList[0].x;
var y0 = vertexList[0].y;
if (x0 < centerX) {
if (y0 < centerY) {
return ORIENTATION_NORMAL;
} else {
return ORIENTATION_270_DEGREE;
}
} else {
if (y0 < centerY) {
return ORIENTATION_90_DEGREE;
} else {
return ORIENTATION_180_DEGREE;
}
}
}
答案 5 :(得分:0)
通常我们需要知道照片中文字的实际旋转角度。 API中提供的坐标信息已经足够完整了。只需计算xy1和xy0之间的角度即可获得旋转角度。
// reset
self.transform = CGAffineTransformIdentity;
CGFloat x_0 = viewData.bounds[0].x;
CGFloat y_0 = viewData.bounds[0].y;
CGFloat x_1 = viewData.bounds[1].x;
CGFloat y_1 = viewData.bounds[1].y;
CGFloat x_3 = viewData.bounds[3].x;
CGFloat y_3 = viewData.bounds[3].y;
// distance
CGFloat width = sqrt(pow(x_0 - x_1, 2) + pow(y_0 - y_1, 2));
CGFloat height = sqrt(pow(x_0 - x_3, 2) + pow(y_0 - y_3, 2));
self.size = CGSizeMake(width, height);
// angle
CGFloat angle = atan2((y_1 - y_0), (x_1 - x_0));
// rotation
self.transform = CGAffineTransformRotate(CGAffineTransformIdentity, angle);
答案 6 :(得分:0)
v1 REST 端点的响应中已包含 orientationDegrees
:
https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#Page
不幸的是,google-cloud-vision 3.2.0 还没有这个https://github.com/googleapis/python-vision/issues/156