我有一个以doc格式保存的文件,我需要提取突出显示的文本。 我有如下代码:
HWPFDocument document = new HWPFDocument(fis);
Range r = document.getRange();
for (int i=0;i<5;i++) {
CharacterRun t = r.getCharacterRun(i);
System.out.println(t.isHighlighted());
System.out.println(t.getHighlightedColor());
System.out.println(r.getCharacterRun(i).SPRM_HIGHLIGHT);
System.out.println(r.getCharacterRun(i));
}
以上方法均未显示文本突出显示,但是当我打开文本时,文本突出显示了。 可能是什么原因,以及如何查找文本是否突出显示?
答案 0 :(得分:1)
可以使用两种不同的方法来突出显示Word
中的文本。首先是applying highlighting至文本运行。第二个是applying shading to words or paragraphs。
首先使用*.doc
二进制文件格式Word
,apache poi
以CharacterRun提供方法。对于第二个apache poi
,请提供Paragraph.getShading。但这仅在阴影适用于整个段落时设置。如果阴影仅应用于单次运行,则apache poi
对此不提供任何帮助。因此,需要使用基础的SprmOperation
。
Microsoft's documentation 2.6.1 Character Properties描述了sprmCShd80 (0x4866)
,它是“一个Shd80结构,用于指定文本的背景阴影。”。所以我们需要搜索。
示例:
import java.io.FileInputStream;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.*;
import org.apache.poi.hwpf.sprm.*;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
public class HWPFInspectBgColor {
private static void showCharacterRunInternals(CharacterRun run) throws Exception {
Field _chpx = CharacterRun.class.getDeclaredField("_chpx");
_chpx.setAccessible(true);
SprmBuffer sprmBuffer = (SprmBuffer) _chpx.get(run);
for (SprmIterator sprmIterator = sprmBuffer.iterator(); sprmIterator.hasNext(); ) {
SprmOperation sprmOperation = sprmIterator.next();
System.out.println(sprmOperation);
}
}
static SprmOperation getCharacterRunShading(CharacterRun run) throws Exception {
SprmOperation shd80Operation = null;
Field _chpx = CharacterRun.class.getDeclaredField("_chpx");
_chpx.setAccessible(true);
Field _value = SprmOperation.class.getDeclaredField("_value");
_value.setAccessible(true);
SprmBuffer sprmBuffer = (SprmBuffer) _chpx.get(run);
for (SprmIterator sprmIterator = sprmBuffer.iterator(); sprmIterator.hasNext(); ) {
SprmOperation sprmOperation = sprmIterator.next();
short sprmValue = (short)_value.get(sprmOperation);
if (sprmValue == (short)0x4866) { // we have a Shd80 structure, see https://msdn.microsoft.com/en-us/library/dd947480(v=office.12).aspx
shd80Operation = sprmOperation;
}
}
return shd80Operation;
}
public static void main(String[] args) throws Exception {
HWPFDocument document = new HWPFDocument(new FileInputStream("sample.doc"));
Range range = document.getRange();
for (int p = 0; p < range.numParagraphs(); p++) {
Paragraph paragraph = range.getParagraph(p);
System.out.println(paragraph);
if (!paragraph.getShading().isEmpty()) {
System.out.println("Paragraph's shading: " + paragraph.getShading());
}
for (int r = 0; r < paragraph.numCharacterRuns(); r++) {
CharacterRun run = paragraph.getCharacterRun(r);
System.out.println(run);
if (run.isHighlighted()) {
System.out.println("Run's highlighted color: " + run.getHighlightedColor());
}
if (getCharacterRunShading(run) != null) {
System.out.println("Run's Shd80 structure: " + getCharacterRunShading(run));
}
}
}
}
}