一个带有文本框(矩形)的MS Word文档,我已经成功使用libreoffice将其转换为PDF。 如何找到pdf中的所有文本框(矩形)以及如何解释矩形的坐标?
@Override
public void modifyPath(PathConstructionRenderInfo renderInfo) {
if (renderInfo.getOperation() == PathConstructionRenderInfo.RECT) {
float x = renderInfo.getSegmentData().get(0);
float y = renderInfo.getSegmentData().get(1);
float w = renderInfo.getSegmentData().get(2);
float h = renderInfo.getSegmentData().get(3);
Vector a = new Vector(x, y, 1).cross(renderInfo.getCtm());
Vector c = new Vector(x + w, y + h, 1).cross(renderInfo.getCtm());
实现ExtRenderListener,仅允许找到页面(A4)矩形,而不找到包含页面中所有内容的(textbox)矩形。
答案 0 :(得分:2)
正如Bruno所指出的那样,问题在于您可能会遇到仅由line-to或move-to操作定义的矩形。
您将需要跟踪所有画线操作,并在它们相交后立即“汇总”(每当画一条线时,其终点/起点与已知线的终点/起点匹配)。 / p>
public class RectangleFinder implements IEventListener {
private Map<Line, Integer> knownLines = new HashMap<>();
private Map<Integer, Integer> clusters = new HashMap<>();
public void eventOccurred(IEventData data, EventType type) {
if(data instanceof PathRenderInfo){
PathRenderInfo pathRenderInfo = (PathRenderInfo) data;
pathRenderInfo.preserveGraphicsState();
Path path = pathRenderInfo.getPath();
if(pathRenderInfo.getOperation() == PathRenderInfo.NO_OP)
return;
if(pathRenderInfo.getOperation() != PathRenderInfo.FILL)
return;
if(!isBlack(pathRenderInfo.getFillColor()))
return;
for(Subpath sPath : path.getSubpaths()){
for(IShape segment : sPath.getSegments()) {
if(segment instanceof Line) {
lineOccurred((Line) segment);
}
}
}
}
}
private boolean isBlack(Color c){
if(c instanceof IccBased){
IccBased col01 = (IccBased) c;
return col01.getNumberOfComponents() == 1 && col01.getColorValue()[0] == 0.0f;
}
if(c instanceof DeviceGray){
DeviceGray col02 = (DeviceGray) c;
return col02.getNumberOfComponents() == 1 && col02.getColorValue()[0] == 0.0f;
}
return false;
}
private void lineOccurred(Line line){
int ID = 0;
if(!knownLines.containsKey(line)) {
ID = knownLines.size();
knownLines.put(line, ID);
}else{
ID = knownLines.get(line);
}
Point start = line.getBasePoints().get(0);
Point end = line.getBasePoints().get(1);
for(Line line2 : knownLines.keySet()){
if(line.equals(line2))
continue;
if(line2.getBasePoints().get(0).equals(start)
|| line2.getBasePoints().get(1).equals(end)
|| line2.getBasePoints().get(0).equals(end)
|| line2.getBasePoints().get(1).equals(start)){
int ID2 = find(knownLines.get(line2));
clusters.put(ID, ID2);
break;
}
}
}
private int find(int ID){
int out = ID;
while(clusters.containsKey(out))
out = clusters.get(out);
return out;
}
public Set<EventType> getSupportedEvents() {
return null;
}
public Collection<Set<Line>> getClusters(){
Map<Integer, Set<Line>> out = new HashMap<>();
for(Integer val : clusters.values())
out.put(val, new HashSet<Line>());
out.put(-1, new HashSet<Line>());
for(Line l : knownLines.keySet()){
int clusterID = clusters.containsKey(knownLines.get(l)) ? clusters.get(knownLines.get(l)) : -1;
out.get(clusterID).add(l);
}
out.remove(-1);
return out.values();
}
public Collection<Rectangle> getBoundingBoxes(){
Set<Rectangle> rectangles = new HashSet<>();
for(Set<Line> cluster : getClusters()){
double minX = Double.MAX_VALUE;
double minY = Double.MAX_VALUE;
double maxX = -Double.MAX_VALUE;
double maxY = -Double.MAX_VALUE;
for(Line l : cluster){
for(Point p : l.getBasePoints()){
minX = Math.min(minX, p.x);
minY = Math.min(minY, p.y);
maxX = Math.max(maxX, p.x);
maxY = Math.max(maxY, p.y);
}
}
double w = (maxX - minX);
double h = (maxY - minY);
rectangles.add(new Rectangle((float) minX, (float) minY, (float) w, (float) h));
}
return rectangles;
}
}
这是我编写的用于在页面上找到黑色(填充)矩形的类。 稍作调整,它也可以找到其他矩形。