如何在Apache POI Word中添加与文本内联的多个方程式?

时间:2017-10-09 10:39:22

标签: java apache apache-poi mathml

我正在使用Apache POI将带有乳胶样式方程的文本转换为MS word文档。在一些帮助下,我能够成功实现它,但如果该行有多个等式,那么它会产生不正确的结果。

以下是我的代码:

import java.io.*;
import org.apache.poi.xwpf.usermodel.*;

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMath;
import org.openxmlformats.schemas.officeDocument.x2006.math.CTOMathPara;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;

import uk.ac.ed.ph.snuggletex.SnuggleInput;
import uk.ac.ed.ph.snuggletex.SnuggleEngine;
import uk.ac.ed.ph.snuggletex.SnuggleSession;

import java.io.IOException;

public class CreateWordFormulaFromMathML {

 static File stylesheet = new File("MML2OMML.XSL");
 static TransformerFactory tFactory = TransformerFactory.newInstance();
 static StreamSource stylesource = new StreamSource(stylesheet); 

 static CTOMath getOMML(String mathML) throws Exception {
  Transformer transformer = tFactory.newTransformer(stylesource);

  StringReader stringreader = new StringReader(mathML);
  StreamSource source = new StreamSource(stringreader);

  StringWriter stringwriter = new StringWriter();
  StreamResult result = new StreamResult(stringwriter);
  transformer.transform(source, result);

  String ooML = stringwriter.toString();
  stringwriter.close();

  CTOMath ctOMath = CTOMath.Factory.parse(ooML);
  return ctOMath.getOMathArray(0);
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument();

  String mstr = "The expression is as: $ax^2 + bx = c$ is easier to understand than $$ax^2 + \\frac{\\sin^{-1}\\theta}{\\cot{-1}} \\times y_1$$ or anything in \\[ ay^2 + b_2 \\theta\\]";

  XWPFParagraph paragraph = document.createParagraph();
  XWPFRun run = paragraph.createRun();
 // run.setText("");

  SnuggleEngine engine = new SnuggleEngine();
  SnuggleSession session = engine.createSession();

  SnuggleInput input = new SnuggleInput(mstr);
  session.parseInput(input);

  String mathML = session.buildXMLString();
  System.out.println("Input " + input.getString() + " was converted to:\n" + mathML + "\n\n");


for(String s : mathML.split("\\s+(?=<math)|(?<=</math>)\\s+")){

if (s.startsWith("<math"))
{
    CTOMath ctOMath = getOMML(s);
    System.out.println(s);

    CTP ctp = paragraph.getCTP();
    ctp.setOMathArray(new CTOMath[]{ctOMath});        
}
else
{
    run.setText(s + " ");
    System.out.println(s);
}
}

  document.write(new FileOutputStream("CreateWordFormulaFromMathML.docx"));
  document.close();

 }
}

生成带

的文档

表达式如下:比ay更容易理解^ 2 + b_2 \ theta

注意:(ay ^ 2 + b_2 \ theta)在单词方程格式中是正确的。

我需要的是在中间带有多重方程的内联文本。

1 个答案:

答案 0 :(得分:1)

如何解决创建Office OpenXML文件{}等文件的解决任务?

{p} *.docx文件(例如Office OpenXML)是*.docx档案。我们可以解压缩它们并查看内部结构。在ZIP我们找到了*.docx,我们在那里找到了描述文档结构的/word/document.xml。对于具有内联公式的段落,我们找到类似的内容:

XML

因此,我们需要多次运行来保存文本,并在它们之间多个<w:p> <w:r> <w:t>text</w:t> </w:r> <m:oMath>... </m:oMath> <w:r> <w:t>text</w:t> </w:r> <m:oMath>... </m:oMath> ... </w:p>

这段落有<m:oMath>... </m:oMath> OMathArray的原因。并且您的代码会使用一个新的数组覆盖此数组,该数组具有一个 CTOMath[] 每个时间,并且找到了额外的CTOMath。相反,每次添加CTOMath时,都需要向数组添加额外的CTOMath

要了解我们可以对CTOMath段落做些什么,我们需要一份相关文档。我发现的最好的是grepcode.com。我们找到CTP.addNewOMath()CTP.setOMathArray(int, CTOMath)

改变你的代码,如:

org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP

应该有用。