Apache POI找不到突出显示的文本

时间:2018-11-21 03:52:10

标签: java apache-poi highlight doc

我有一个以doc格式保存的文件,我需要提取突出显示的文本。 我有如下代码:

HWPFDocument document = new HWPFDocument(fis);
        Range r = document.getRange();
        for (int i=0;i<5;i++) {
            CharacterRun t = r.getCharacterRun(i);
            System.out.println(t.isHighlighted());
            System.out.println(t.getHighlightedColor());
            System.out.println(r.getCharacterRun(i).SPRM_HIGHLIGHT);
            System.out.println(r.getCharacterRun(i));
        }

以上方法均未显示文本突出显示,但是当我打开文本时,文本突出显示了。 可能是什么原因,以及如何查找文本是否突出显示?

1 个答案:

答案 0 :(得分:1)

可以使用两种不同的方法来突出显示Word中的文本。首先是applying highlighting至文本运行。第二个是applying shading to words or paragraphs

首先使用*.doc二进制文件格式Wordapache poiCharacterRun提供方法。对于第二个apache poi,请提供Paragraph.getShading。但这仅在阴影适用于整个段落时设置。如果阴影仅应用于单次运行,则apache poi对此不提供任何帮助。因此,需要使用基础的SprmOperation

Microsoft's documentation 2.6.1 Character Properties描述了sprmCShd80 (0x4866),它是“一个Shd80结构,用于指定文本的背景阴影。”。所以我们需要搜索。

示例:

import java.io.FileInputStream;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.*;

import org.apache.poi.hwpf.sprm.*;

import java.lang.reflect.Field;
import java.lang.reflect.Method;

public class HWPFInspectBgColor {

 private static void showCharacterRunInternals(CharacterRun run) throws Exception {
  Field _chpx = CharacterRun.class.getDeclaredField("_chpx"); 
  _chpx.setAccessible(true);
  SprmBuffer sprmBuffer = (SprmBuffer) _chpx.get(run);
  for (SprmIterator sprmIterator = sprmBuffer.iterator(); sprmIterator.hasNext(); ) {
   SprmOperation sprmOperation = sprmIterator.next();
System.out.println(sprmOperation);
  }
 }

 static SprmOperation getCharacterRunShading(CharacterRun run) throws Exception {
  SprmOperation shd80Operation = null;
  Field _chpx = CharacterRun.class.getDeclaredField("_chpx"); 
  _chpx.setAccessible(true);
  Field _value = SprmOperation.class.getDeclaredField("_value"); 
  _value.setAccessible(true);
  SprmBuffer sprmBuffer = (SprmBuffer) _chpx.get(run);
  for (SprmIterator sprmIterator = sprmBuffer.iterator(); sprmIterator.hasNext(); ) {
   SprmOperation sprmOperation = sprmIterator.next();
   short sprmValue = (short)_value.get(sprmOperation);
   if (sprmValue == (short)0x4866) { // we have a Shd80 structure, see https://msdn.microsoft.com/en-us/library/dd947480(v=office.12).aspx
    shd80Operation = sprmOperation;
   }
  }
  return shd80Operation;
 }

 public static void main(String[] args) throws Exception {
  HWPFDocument document = new HWPFDocument(new FileInputStream("sample.doc"));
  Range range = document.getRange();
  for (int p = 0; p < range.numParagraphs(); p++) {
   Paragraph paragraph = range.getParagraph(p);
System.out.println(paragraph);
   if (!paragraph.getShading().isEmpty()) {
System.out.println("Paragraph's shading: " + paragraph.getShading());
   }

   for (int r = 0; r < paragraph.numCharacterRuns(); r++) {
    CharacterRun run = paragraph.getCharacterRun(r);
System.out.println(run);
    if (run.isHighlighted()) {
System.out.println("Run's highlighted color: " + run.getHighlightedColor());
    }
    if (getCharacterRunShading(run) != null) {
System.out.println("Run's Shd80 structure: " + getCharacterRunShading(run));
    }
   }
  }
 }
}