我正在尝试提供一种解决方案,允许用户上传启用邮件合并功能的Word DOCX模板文件。理想情况下,系统将读取DOCX文件,提取XML,查找邮件合并字段并将其保存到数据库中以便在路上进行映射。我可能会使用诸如Zend LiveDocX或PHPDOCX之类的SOAP服务或完全不同的东西 - 但是现在我需要弄清楚如何识别DOCX文件中的字段。为此,我从这篇文章开始:http://dfmaxwell.wordpress.com/2012/02/24/using-php-to-process-a-word-document-mail-merge/
我已经根据我的需要调整了一些(这可能是一个问题,虽然我也得到了与原始代码相同的错误。)特别是我此时并没有使用它来执行邮件合并,我只想识别字段。这就是我所拥有的:
$newFile = '/var/www/mysite.com/public_html/template.docx';
$zip = new ZipArchive();
if( $zip->open( $newFile, ZIPARCHIVE::CHECKCONS ) !== TRUE ) { echo 'failed to open template'; exit; }
$file = 'word/document.xml';
$data = $zip->getFromName( $file );
$zip->close();
$doc = new DOMDocument();
$doc->loadXML( $data );
$wts = $doc->getElementsByTagNameNS('http://schemas.openxmlformats.org/wordprocessingml/2006/main', 'fldChar');
$mergefields = array();
function getMailMerge(&$wts, $index) {
$loop = true;
$counter = $index;
$startfield = false;
while ($loop) {
if ($wts->item($counter)->attributes->item(0)->nodeName == 'w:fldCharType') {
$nodeName = '';
$nodeValue = '';
switch ($wts->item($counter)->attributes->item(0)->nodeValue) {
case 'begin':
if ($startfield) {
$counter = getMailMerge($wts, $counter);
}
$startfield = true;
if ($wts->item($counter)->parentNode->nextSibling) {
$nodeName = $wts->item($counter)->parentNode->nextSibling->childNodes->item(1)->nodeName;
$nodeValue = $wts->item($counter)->parentNode->nextSibling->childNodes->item(1)->nodeValue;
}
else {
// No sibling
// check next node
$nodeName = $wts->item($counter + 1)->parentNode->previousSibling->childNodes->item(1)->nodeName;
$nodeValue = $wts->item($counter + 1)->parentNode->previousSibling->childNodes->item(1)->nodeValue;
}
if (substr($nodeValue, 0, 11) == ' MERGEFIELD') {
$mergefields[] = strtolower(str_replace('"', '', trim(substr($nodeValue, 12))));
}
$counter++;
break;
case 'separate':
$counter++;
break;
case 'end':
if ($startfield) {
$startfield = false;
}
$loop = false;
}
}
}
return $counter;
}
for ($x = 0; $x < $wts->length; $x++) {
if ($wts->item($x)->attributes->item(0)->nodeName == 'w:fldCharType' && $wts->item($x)->attributes->item(0)->nodeValue == 'begin') {
$newcount = getMailMerge($wts, $x);
$x = $newcount;
}
}
使用ZipArchive()打开DOCX文件没问题,如果我使用print_r($ doc-&gt; saveHTML());我看到XML数据就好了。问题是,当我执行我的代码时,我得到致命错误:在指向此对象的非对象上调用成员函数item():
$nodeName = $wts->item($counter)->parentNode->nextSibling->childNodes->item(1)->nodeName;
谷歌在试图找出这个错误时让我失望,有人能指出我正确的方向吗?提前谢谢!
答案 0 :(得分:0)
找到了一个解决方案 - 它并不像我希望的那样优雅,但这里也是如此。
使用xml_parser_create_ns我可以在DOCX文件中搜索我需要的密钥,特别是“HTTP://SCHEMAS.OPENXMLFORMATS.ORG/WORDPROCESSINGML/2006/MAIN:INSTRTEXT”,它标识标记为“MERGEFIELD”的所有字段。然后我可以将结果转储到数组中并使用它们来更新数据库。即:
// Word file to be opened
$newFile = '/var/www/mysite.com/public_html/template.docx';
// Extract the document.xml file from the DOCX archive
$zip = new ZipArchive();
if( $zip->open( $newFile, ZIPARCHIVE::CHECKCONS ) !== TRUE ) { echo 'failed to open template'; exit; }
$file = 'word/document.xml';
$data = $zip->getFromName( $file );
$zip->close();
// Create the XML parser and create an array of the results
$parser = xml_parser_create_ns();
xml_parse_into_struct($parser, $data, $vals, $index);
xml_parser_free($parser);
// Cycle the index array looking for the important key and save those items to another array
foreach ($index as $key => $indexitem) {
if ($key == 'HTTP://SCHEMAS.OPENXMLFORMATS.ORG/WORDPROCESSINGML/2006/MAIN:INSTRTEXT') {
$found = $indexitem;
break;
}
}
// Cycle *that* array looking for "MERGEFIELD" and grab the field name to yet another array
// Make sure to check for duplicates since fields may be re-used
if ($found) {
$mergefields = array();
foreach ($found as $field) {
if (!in_array(strtolower(trim(substr($vals[$field]['value'], 12))), $mergefields)) {
$mergefields[] = strtolower(trim(substr($vals[$field]['value'], 12)));
}
}
}
// View the fruits of your labor
print_r($mergefields);