除非该单元格具有内容控制(下拉菜单),否则我能够使用表/行/单元格解析Word文档并从单元格中获取文本。如果存在内容控件,则不会拉出任何内容。我已经测试过它试图用Text.class或Tc.class来获取任何东西,即使它是其XML块的一部分,也没有看到它。
我研究了docx4j.wml中的类类型,并尝试了几种我认为合适的类。 CTSdtCell正在查找我需要的代码块,但并没有做很多事情。
从输出中查找sdt内容,而不是其中的单元格(w:tc)。如果找不到单元格,则不会找到文本(w:t)
该文档有九行。我从前两行中删除了所有内容控件,而其余七个保持不变。当它到达带有内容控件的行时,它不会将其视为一个单元格(w:tc),而只是其中没有任何单元格的内容控件(w:sdt)。
import java.io.File;
import java.util.List;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.CTSdtCell;
import org.docx4j.wml.Tc;
import org.docx4j.wml.Tr;
public class ReadWordDocTest implements Utilities {
private static final String OUTLOOK_DOC_PATH = System.getProperty("user.home") + "\\workspace\\Test\\Projects\\";
public static void main(String[] args) throws Exception {
new ReadWordDocTest();
}
public ReadWordDocTest() throws Exception {
String documentFilename = ("ATL.docx");
WordprocessingMLPackage mlp = WordprocessingMLPackage.load(new File(OUTLOOK_DOC_PATH + documentFilename));
MainDocumentPart mdp = mlp.getMainDocumentPart();
List<Object> rowsList = getAllElementFromObject(mdp, Tr.class);
rowsList.subList(0, 2).clear(); // Header stuff. Skip.
// Rows
for (Object row : rowsList) {
List<Object> cellsList = getAllElementFromObject(row, Tc.class);
List<Object> sdtObjList = getAllElementFromObject(row, CTSdtCell.class);
System.out.println("Cells " + cellsList.size() + " Content control " + sdtObjList.size());
}
}
}
输出
Cells 7 Content control 0
Cells 7 Content control 0
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
Cells 3 Content control 4
使用内容控件的单元格中的XML示例
<w:sdt xmlns:dsp="http://schemas.microsoft.com/office/drawing/2008/diagram" xmlns:cppr="http://schemas.microsoft.com/office/2006/coverPageProps" xmlns:odx="http://opendope.org/xpaths" xmlns:c14="http://schemas.microsoft.com/office/drawing/2007/8/2/chart" xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing" xmlns:odgm="http://opendope.org/SmartArt/DataHierarchy" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:we="http://schemas.microsoft.com/office/webextensions/webextension/2010/11" xmlns:pvml="urn:schemas-microsoft-com:office:powerpoint" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:comp="http://schemas.openxmlformats.org/drawingml/2006/compatibility" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart" xmlns:xvml="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:oda="http://opendope.org/answers" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:odc="http://opendope.org/conditions" xmlns:cdr="http://schemas.openxmlformats.org/drawingml/2006/chartDrawing" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:odi="http://opendope.org/components" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:lc="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas" xmlns:odq="http://opendope.org/questions" xmlns:wetp="http://schemas.microsoft.com/office/webextensions/taskpanes/2010/11" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid">
<w:sdtPr>
<w:rPr>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
</w:rPr>
<w:id w:val="1239367024"/>
<w:placeholder>
<w:docPart w:val="059F92C89F2F410BB7231E2BAA981321"/>
</w:placeholder>
<w:date>
<w:dateFormat w:val="M/d/yyyy"/>
<w:lid w:val="en-US"/>
<w:storeMappedDataAs w:val="dateTime"/>
<w:calendar w:val="gregorian"/>
</w:date>
</w:sdtPr>
<w:sdtContent>
<w:tc>
<w:tcPr>
<w:tcW w:w="1170" w:type="dxa"/>
</w:tcPr>
<w:p w:rsidRPr="007D4D1F" w:rsidR="00040B4E" w:rsidP="00040B4E" w:rsidRDefault="00040B4E">
<w:pPr>
<w:ind w:left="0" w:firstLine="0"/>
<w:jc w:val="center"/>
<w:cnfStyle w:val="000000000000"/>
<w:rPr>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
</w:rPr>
</w:pPr>
<w:r>
<w:rPr>
<w:sz w:val="18"/>
<w:szCs w:val="18"/>
</w:rPr>
<w:t>02/01/2019</w:t>
</w:r>
</w:p>
</w:tc>
</w:sdtContent>
界面中的方法
default List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
List<Object> result = new ArrayList<>();
if (obj instanceof JAXBElement)
obj = ((JAXBElement<?>) obj).getValue();
if (obj.getClass().equals(toSearch)) {
result.add(obj);
} else if (obj instanceof ContentAccessor) {
List<?> children = ((ContentAccessor) obj).getContent();
for (Object child : children) {
result.addAll(getAllElementFromObject(child, toSearch));
}
}
return result;
}
答案 0 :(得分:0)
根据JasonPlutext的响应,CTSdtCell不实现ContentAccessor。通过SdtElement进行路由,并使用其getSdtContent()方法。
class DynamicContentFragment : Fragment() {
companion object {
private const val KEY_LAYOUT_ID = "layoutId"
fun instance(@LayoutRes layoutRes: Int) =
DynamicContentFragment().apply {
arguments = Bundle().apply { putInt(KEY_LAYOUT_ID, layoutRes) }
}
}
override fun onCreateView(inflater: LayoutInflater, container: ViewGroup?, savedInstanceState: Bundle?): View {
val layout = arguments!!.getInt(KEY_LAYOUT_ID)!!
return inflater.inflate(layout, container, false)
}
}
class UseCase {
fun test(fm: FragmentManager) {
fm.beginTransaction()
.replace(R.id.container, DynamicContentFragment.instance(R.layout.main))
}
}