我试图在图像上绘制文本边界框。图像 用给定的系数集进行透视变换。转换前的文本坐标是已知的,我想在转换后计算文本的坐标。
根据我的理解,如果我将使用图像变换中使用的系数的透视变换应用于文本坐标,我将获得变换后文本的结果坐标。但是,文本不会出现在它应该出现的位置。
较小的白框可以很好地限制文本,因为我知道文本的坐标。
由于在转换坐标时出现一些错误,较小的白框未绑定文本。
我按照文档 reference for coefficients of perspective transformation 并使用以下代码查找图像变换系数:origin of the code is from this answer
def find_coeffs(pa, pb):
'''
find the coefficients for perspective transform.
parameters:
pa : verticies in the resulting plane
pb : verticies in the current plane
retrun:
coeffs : 8- tuple
coefficents for PIL perspective transform
'''
matrix = []
for p1, p2 in zip(pa, pb):
matrix.append([p1[0], p1[1], 1, 0, 0, 0, -p2[0]*p1[0], -p2[0]*p1[1]])
matrix.append([0, 0, 0, p1[0], p1[1], 1, -p2[1]*p1[0], -p2[1]*p1[1]])
A = np.matrix(matrix, dtype=np.float)
B = np.array(pb).reshape(8)
res = np.dot(np.linalg.inv(A.T * A) * A.T, B)
return np.array(res).reshape(8)
我的文本边界框转换代码:
# perspective transformation
a, b, c, d, e, f, g, h = coeffs
# return two vertices defining the bounding box
new_x0 = float(a * new_x0 - b * new_y0 + c) / float(g * new_x0 + h * new_y0 + 1)
new_y0 = float(d * new_x0 + e * new_y0 + f) / float(g * new_x0 + h * new_y0 + 1)
new_x1 = float(a * new_x1 - b * new_y1 + c) / float(g * new_x1 + h * new_y1 + 1)
new_y1 = float(d * new_x1 + e * new_y1 + f) / float(g * new_x1 + h * new_y1 + 1)
我也去了Pillow Github,但我找不到定义透视变换的源代码。
有关透视变换数学的更多信息。 The Geometry of Perspective Drawing on the Computer
感谢。
答案 0 :(得分:0)
要在转换后计算新点,您应该从A-> B而不是B-> A获得系数,这是PIL库的标准。例如:
# A1, B1 ... are points
# direct transform
coefs = find_coefs([B1, B2, B3, B4], [A1, A2, A3, A4])
# inverse transform
coefs_inv = find_coefs([A1, A2, A3, A4], [B1, B2, B3, B4])
您使用image.transform()
调用coefs_inv
函数,但使用coefs
计算新点以得到如下结果:
img = image.transform(((1500,800)),
method=Image.PERSPECTIVE,
data=coefs_inv)
a, b, c, d, e, f, g, h = coefs
old_p1 = [50, 100]
x,y = old_p1
new_x = (a * x + b * y + c) / (g * x + h * y + 1)
new_y = (d * x + e * y + f) / (g * x + h * y + 1)
new_p1 = (int(new_x),int(new_y))
old_p2 = [400, 500]
x,y = old_p2
new_x = (a * x + b * y + c) / (g * x + h * y + 1)
new_y = (d * x + e * y + f) / (g * x + h * y + 1)
new_p2 = (int(new_x),int(new_y))
下面的完整代码:
import os
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
def find_coefs(original_coords, warped_coords):
matrix = []
for p1, p2 in zip(original_coords, warped_coords):
matrix.append([p1[0], p1[1], 1, 0, 0, 0, -p2[0]*p1[0], -p2[0]*p1[1]])
matrix.append([0, 0, 0, p1[0], p1[1], 1, -p2[1]*p1[0], -p2[1]*p1[1]])
A = np.matrix(matrix, dtype=np.float)
B = np.array(warped_coords).reshape(8)
res = np.dot(np.linalg.inv(A.T * A) * A.T, B)
return np.array(res).reshape(8)
coefs = find_coefs(
[(867,652), (1020,580), (1206,666), (1057,757)],
[(700,732), (869,754), (906,916), (712,906)]
)
coefs_inv = find_coefs(
[(700,732), (869,754), (906,916), (712,906)],
[(867,652), (1020,580), (1206,666), (1057,757)]
)
image = Image.open('sample.png')
img = image.transform(((1500,800)),
method=Image.PERSPECTIVE,
data=coefs_inv)
a, b, c, d, e, f, g, h = coefs
old_p1 = [50, 100]
x,y = old_p1
new_x = (a * x + b * y + c) / (g * x + h * y + 1)
new_y = (d * x + e * y + f) / (g * x + h * y + 1)
new_p1 = (int(new_x),int(new_y))
old_p2 = [400, 500]
x,y = old_p2
new_x = (a * x + b * y + c) / (g * x + h * y + 1)
new_y = (d * x + e * y + f) / (g * x + h * y + 1)
new_p2 = (int(new_x),int(new_y))
plt.figure()
plt.imshow(image)
plt.scatter([old_p1[0], old_p2[0]],[old_p1[1], old_p2[1]] , s=150, marker='.', c='b')
plt.show()
plt.figure()
plt.imshow(img)
plt.scatter([new_p1[0], new_p2[0]],[new_p1[1], new_p2[1]] , s=150, marker='.', c='r')
plt.show()