我的目标是在搜索到的文本上绘制矩形。
我已经实现了LocationTextExtractionStrategy类,该类将文本块连接到句子中(每行一个),并返回起始位置X和Y。
我正在使用Getting Coordinates of string using ITextExtractionStrategy and LocationTextExtractionStrategy in Itextsharp的解决方案,这就是我到目前为止的工作(下面是组织块的代码)
<h1>Image ID:</h1>
<h2>{{target}}</h2>
<table>
<tr>
<th>Image ID</th>
{% for images in image %}
<tr>
<td>{{ images.image_id }}</td>
</tr>
{% endfor %}
当我尝试在几行文字中绘制一个矩形时,它甚至不靠近它。我这样绘制矩形:
public override void RenderText(TextRenderInfo renderInfo)
{
LineSegment segment = renderInfo.GetBaseline();
if (renderInfo.GetRise() != 0)
{ // remove the rise from the baseline - we do this because the text from a super/subscript render operations should probably be considered as part of the baseline of the text the super/sub is relative to
Matrix riseOffsetTransform = new Matrix(0, -renderInfo.GetRise());
segment = segment.TransformBy(riseOffsetTransform);
}
TextChunk tc = new TextChunk(renderInfo.GetText(), tclStrat.CreateLocation(renderInfo, segment));
locationalResult.Add(tc);
}
public IList<TextLocation> GetLocations()
{
var filteredTextChunks = filterTextChunks(locationalResult, null);
filteredTextChunks.Sort();
TextChunk lastChunk = null;
var textLocations = new List<TextLocation>();
foreach (var chunk in filteredTextChunks)
{
if (lastChunk == null)
{
//initial
textLocations.Add(new TextLocation
{
Text = chunk.Text,
X = chunk.Location.StartLocation[0],
Y = chunk.Location.StartLocation[1]
});
}
else
{
if (chunk.SameLine(lastChunk))
{
var text = "";
// we only insert a blank space if the trailing character of the previous string wasn't a space, and the leading character of the current string isn't a space
if (IsChunkAtWordBoundary(chunk, lastChunk) && !StartsWithSpace(chunk.Text) && !EndsWithSpace(lastChunk.Text))
text += ' ';
text += chunk.Text;
textLocations[textLocations.Count - 1].Text += text;
}
else
{
textLocations.Add(new TextLocation
{
Text = chunk.Text,
X = chunk.Location.StartLocation[0],
Y = chunk.Location.StartLocation[1]
});
}
}
lastChunk = chunk;
}
//now find the location(s) with the given texts
return textLocations;
}
答案 0 :(得分:1)
如果您要使用iText7
和pdfSweep
,它实际上具有执行此操作的功能。
RegexBasedCleanupStrategy st = new RegexBasedCleanupStrategy("the_word_to_highlight");
PdfAutoSweep sweep = new PdfAutoSweep(st);
PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputfile));
sweep.highlight(pdfDocument);
pdfDocument.close();
这将突出显示您要查找的单词。 当然,您只需做一些较小的配置就可以做更多的事情。
答案 1 :(得分:0)
请设置
var hasItemsExpr = hasItems();
var query =
from w in _context.Wallets.AsExpandable()
where w.WalletItems.Any(hasItemsExpr.Compile())
select w;
实例化压模后。
您的样本PDF已旋转页面。在这种情况下,默认情况下,iText 5.x会尝试通过解释您在不同旋转坐标系中的绘图指令中给出的坐标来为您提供帮助。但是,由于文本提取坐标系保持不变,因此对于旋转的页面,使用提取的坐标绘制某些内容会失败。上面的设置将禁用此帮助。