我的目标是在PDF中的注释中查找给定模式的JavaScript。为此,我提供了以下代码:
public static void main(String[] args) {
try {
// Reads and parses a PDF document
PdfReader reader = new PdfReader("Test.pdf");
// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
// Get a page a PDF page
PdfDictionary page = reader.getPageN(i);
// Get all the annotations of page i
PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);
// If page does not have annotations
if (page.getAsArray(PdfName.ANNOTS) == null) {
continue;
}
// For each annotation
for (int j = 0; j < annotsArray.size(); ++j) {
// For current annotation
PdfDictionary curAnnot = annotsArray.getAsDict(j);
// check if has JS as described below
PdfDictionary AnnotationAction = AnnotationDictionary.GetAsDict(PdfName.A);
// test if it is a JavaScript action
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
// what here?
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
据我所知,比较字符串是由StringCompare library完成的。问题是它比较了两个字符串,但我很想知道注释中的JavaScript操作是否以(或包含)此字符串开头:if (this.hostContainer) { try {
那么,如何检查注释中的JavaScript是否包含上述字符串?
修改 JS的示例页面位于:pdf with JS
答案 0 :(得分:1)
JavaScript操作在ISO 32000-1中定义如下:
12.6.4.16 JavaScript操作
在调用JavaScript动作时,符合标准的处理器应执行用JavaScript编程语言编写的脚本。根据脚本的性质,文档中的各种交互式表单域可以更新其值或更改其视觉外观。 Mozilla开发中心的客户端JavaScript参考和Adobe JavaScript for Acrobat API参考(参见参考书目)详细介绍了JavaScript脚本的内容和效果。表217显示了特定于此类操作的操作字典条目。
表217 - 特定于JavaScript操作的其他条目
<强>关键强> 类型 值
<强>取值强> 名称 (必需)此词典描述的操作类型;应该是用于JavaScript操作的JavaScript。
<强> JS 强> 文字字符串或 文字流 (必需)包含要执行的JavaScript脚本的文本字符串或文本流。 PDFDocEncoding或Unicode编码(后者由Unicode前缀U + FEFF标识)应用于编码字符串或流的内容。
为了支持在 JavaScript 脚本中使用参数化函数调用,PDF文档名称字典中的JavaScript条目(参见7.7.4,“名称字典”)可能包含映射名称的名称树字符串到文档级JavaScript操作。打开文档时,应执行此名称树中的所有操作,定义JavaScript函数以供文档中的其他脚本使用。
因此,如果您有兴趣知道注释中的JavaScript操作是否以(或包含)此字符串开头(或包含):if (this.hostContainer) { try {
if (AnnotationAction.Get(PdfName.S).Equals(PdfName.JavaScript)){
// what here?
}
您可能希望首先检查AnnotationAction.Get(PdfName.JS)
是PdfString
还是PdfStream
,在任何一种情况下都将内容检索为字符串,并检查它是否或任何功能调用(函数可能在JavaScript名称树中定义)包含使用常用字符串比较方法搜索的字符串。
我拿了你的代码,清理了一下(特别是它是C#和Java的混合),并添加了如上所述的代码,检查注释操作元素中的直接JavaScript代码:
System.out.println("file.pdf - Looking for special JavaScript actions.");
// Reads and parses a PDF document
PdfReader reader = new PdfReader(resource);
// For each PDF page
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
System.out.printf("\nPage %d\n", i);
// Get a page a PDF page
PdfDictionary page = reader.getPageN(i);
// Get all the annotations of page i
PdfArray annotsArray = page.getAsArray(PdfName.ANNOTS);
// If page does not have annotations
if (annotsArray == null)
{
System.out.printf("No annotations.\n", i);
continue;
}
// For each annotation
for (int j = 0; j < annotsArray.size(); ++j)
{
System.out.printf("Annotation %d - ", j);
// For current annotation
PdfDictionary curAnnot = annotsArray.getAsDict(j);
// check if has JS as described below
PdfDictionary annotationAction = curAnnot.getAsDict(PdfName.A);
if (annotationAction == null)
{
System.out.print("no action");
}
// test if it is a JavaScript action
else if (PdfName.JAVASCRIPT.equals(annotationAction.get(PdfName.S)))
{
PdfObject scriptObject = annotationAction.getDirectObject(PdfName.JS);
if (scriptObject == null)
{
System.out.print("missing JS entry");
continue;
}
final String script;
if (scriptObject.isString())
script = ((PdfString)scriptObject).toUnicodeString();
else if (scriptObject.isStream())
{
try ( ByteArrayOutputStream baos = new ByteArrayOutputStream() )
{
((PdfStream)scriptObject).writeContent(baos);
script = baos.toString("ISO-8859-1");
}
}
else
{
System.out.println("malformed JS entry");
continue;
}
if (script.contains("if (this.hostContainer) { try {"))
System.out.print("contains test string - ");
System.out.printf("\n---\n%s\n---", script);
// what here?
}
else
{
System.out.print("no JavaScript action");
}
System.out.println();
}
}
(测试SearchActionJavaScript,方法testSearchJsActionInFile
)
using (PdfReader reader = new PdfReader(sourcePath))
{
Console.WriteLine("file.pdf - Looking for special JavaScript actions.");
// For each PDF page
for (int i = 1; i <= reader.NumberOfPages; i++)
{
Console.Write("\nPage {0}\n", i);
// Get a page a PDF page
PdfDictionary page = reader.GetPageN(i);
// Get all the annotations of page i
PdfArray annotsArray = page.GetAsArray(PdfName.ANNOTS);
// If page does not have annotations
if (annotsArray == null)
{
Console.WriteLine("No annotations.");
continue;
}
// For each annotation
for (int j = 0; j < annotsArray.Size; ++j)
{
Console.Write("Annotation {0} - ", j);
// For current annotation
PdfDictionary curAnnot = annotsArray.GetAsDict(j);
// check if has JS as described below
PdfDictionary annotationAction = curAnnot.GetAsDict(PdfName.A);
if (annotationAction == null)
{
Console.Write("no action");
}
// test if it is a JavaScript action
else if (PdfName.JAVASCRIPT.Equals(annotationAction.Get(PdfName.S)))
{
PdfObject scriptObject = annotationAction.GetDirectObject(PdfName.JS);
if (scriptObject == null)
{
Console.WriteLine("missing JS entry");
continue;
}
String script;
if (scriptObject.IsString())
script = ((PdfString)scriptObject).ToUnicodeString();
else if (scriptObject.IsStream())
{
using (MemoryStream stream = new MemoryStream())
{
((PdfStream)scriptObject).WriteContent(stream);
script = stream.ToString();
}
}
else
{
Console.WriteLine("malformed JS entry");
continue;
}
if (script.Contains("if (this.hostContainer) { try {"))
Console.Write("contains test string - ");
Console.Write("\n---\n{0}\n---", script);
// what here?
}
else
{
Console.Write("no JavaScript action");
}
Console.WriteLine();
}
}
}
针对您的示例文件运行任一版本时,会得到:
file.pdf - Looking for special JavaScript actions.
Page 1
Annotation 0 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_vii', 0]);
} catch(e) { console.println(e); }};
---
Annotation 1 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_ix', 0]);
} catch(e) { console.println(e); }};
---
Annotation 2 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_xi', 0]);
} catch(e) { console.println(e); }};
---
Annotation 3 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_3', 0]);
} catch(e) { console.println(e); }};
---
Annotation 4 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_15', 0]);
} catch(e) { console.println(e); }};
---
Annotation 5 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_37', 0]);
} catch(e) { console.println(e); }};
---
Annotation 6 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_57', 0]);
} catch(e) { console.println(e); }};
---
Annotation 7 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_81', 0]);
} catch(e) { console.println(e); }};
---
Annotation 8 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_111', 0]);
} catch(e) { console.println(e); }};
---
Annotation 9 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_136', 0]);
} catch(e) { console.println(e); }};
---
Annotation 10 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_160', 0]);
} catch(e) { console.println(e); }};
---
Annotation 11 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_197', 0]);
} catch(e) { console.println(e); }};
---
Annotation 12 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_179', 0]);
} catch(e) { console.println(e); }};
---
Annotation 13 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_201', 0]);
} catch(e) { console.println(e); }};
---
Annotation 14 - contains test string -
---
if (this.hostContainer) { try {
this.hostContainer.postMessage(['newPage', 'pp_223', 0]);
} catch(e) { console.println(e); }};
---
Page 2
No annotations.
Page 3
No annotations.