生成适当标记以从PHP插入WordPress的最佳方法(从另一个CMS导入)

时间:2017-01-12 17:40:20

标签: php wordpress content-management-system database-migration markup

我被分配从一个属于专有CMS系统的某个数据库导入大量内容到新的WordPress安装。编写好的PHP脚本来检索条目并使用wp_insert_post()函数插入它们之后,我现在遇到了问题。

我想要做的是"过滤"我的输入字符串,即源内容,以适应内容被复制粘贴到内置编辑器时WordPress本身使用的格式。例如,这就是它的样子:

<strong>UIR e OER</strong>

&nbsp;

Os verbos terminados em <strong>-uir</strong> e <strong>-oer</strong> terão as 2ª e 3ª pessoas do singular do presente do indicativo escritas com <strong>-i-</strong>:

<strong> </strong>

<strong>– tu possuis</strong>

<strong>– ele possui</strong>

<strong>– tu constróis</strong>

...

现在,这是从源数据库中检索原始内容的方式:

<p>&nbsp;<b style="line-height: 150%; text-align: center;"><span style="font-size:13.5pt;line-height:150%;  font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;mso-fareast-font-family:&quot;Times New Roman&quot;;  mso-fareast-language:PT-BR">UIR e OER</span></b></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">&nbsp;<o:p></o:p></span></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">Os verbos terminados em <b>-uir</b> e <b>-oer</b> ter&atilde;o as 2&ordf; e 3&ordf; pessoas do singular do presente do&nbsp;indicativo escritas com <b>-i-</b>:<b> <o:p></o:p></b></span></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">&nbsp;</span></b></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- tu possuis<o:p></o:p></span></b></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- ele possui<o:p></o:p></span></b></p>  <p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;  mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- tu constr&oacute;is<o:p></o:p></span></b></p>  

起初似乎wp_insert_post()会自动处理它,它实际上会进行一些处理,但这还不够。

以下是导入脚本存储 的内容:

<p>&nbsp;<b style="line-height: 150%; text-align: center;"><span style="font-size:13.5pt;line-height:150%;
font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;mso-fareast-font-family:&quot;Times New Roman&quot;;
mso-fareast-language:PT-BR">UIR e OER</span></b></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">&nbsp;<o:p></o:p></span></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">Os verbos terminados em <b>-uir</b> e <b>-oer</b> ter&atilde;o as 2&ordf; e 3&ordf; pessoas do singular do presente do&nbsp;indicativo escritas com <b>-i-</b>:<b> <o:p></o:p></b></span></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">&nbsp;</span></b></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- tu possuis<o:p></o:p></span></b></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- ele possui<o:p></o:p></span></b></p>
<p class="MsoNormal" style="mso-margin-bottom-alt:auto;line-height:150%"><b><span style="font-size:12.0pt;line-height:150%;font-family:&quot;Arial&quot;,&quot;sans-serif&quot;;
mso-fareast-font-family:&quot;Times New Roman&quot;;mso-fareast-language:PT-BR">- tu constr&oacute;is<o:p></o:p></span></b></p>

我的第一个想法是基于preg_replace()html_entity_decode()自己实现一个功能,但在我看来,有一个更优雅的解决方案。有吗?

编辑:换句话说,PHP(或WordPress本身)是否提供了一种处理内容的方法,如TinyMCE(WordPress内置编辑器)?当然,我不能依赖TinyMCE本身,因为它是一个JavaScript工具。

1 个答案:

答案 0 :(得分:0)

在我最近的项目中,我们需要做同样的事情。我们使用以下方法:

  1. preg_replace用于最简单的任务。
  2. DOMDocument。这是一个用于解析HTML的优秀PHP工具。
  3. (非PHP)主要导入是通过节点完成的。通过一些必要的调整,wp-cli node module是操作WordPress环境的绝佳工具。然后,我们可以使用cheeriojs来解析和修改HTML。