提取html文件的protion

时间:2017-12-05 16:13:19

标签: php html html5 dompdf

我使用dompdf来创建pdf文件,我使用了html文件的一部分,即之间生成pdf。 (剪切和粘贴手动方式)

由于我现在有一个有效的pdf输出,我想进一步自动化该过程,

我想复制表之间的所有内容

<table> </table> 

到一个文件,想知道php中可能的选项。 任何建议都非常感谢

2 个答案:

答案 0 :(得分:0)

请勿使用正则表达式,而是使用DomDocument

以下类将提取任何元素之间的内容。所以从你的文件加载你的html,或者只是传递ob_get_contents()

的内容
<?php 

class DOMExtract extends DOMDocument
{
    private $source;
    private $dom;

    public function __construct()
    {
        libxml_use_internal_errors(true);
        $this->preserveWhiteSpace = false;
        $this->strictErrorChecking = false;
        $this->formatOutput = true;
    }

    public function setSource($source)
    {
        $this->source = $source;
        return $this;
    }

    public function getInnerHTML($tag, $id=null, $nodeValue = false)
    {
        if (empty($this->source)) {
            throw new Exception('Error: Missing $this->source, use setSource() first');
        }

        $this->loadHTML($this->source);
        $tmp = $this->getElementsByTagName($tag);
        $ret = null;
        foreach ($tmp as $v) {
            if ($id !== null) {
                $attr = explode('=', $id);
                if ($v->getAttribute($attr[0]) == $attr[1]) {
                    if ($nodeValue == true) {
                        $ret .= trim($v->nodeValue);
                    } else {
                        $ret .= $this->innerHTML($v);
                    }
                }
            } else {
                if ($nodeValue == true) {
                    $ret .= trim($v->nodeValue);
                } else{
                    $ret .= $this->innerHTML($v);
                }
            }
        }
        return $ret;
    }

    protected function innerHTML($dom)
    {
        $ret = "";
        foreach ($dom->childNodes as $v) {
            $tmp = new DOMDocument();
            $tmp->appendChild($tmp->importNode($v, true));
            $ret .= trim($tmp->saveHTML());
        }
        return $ret;
    }

}

$html = '
<h3>HTML Table Example</h3>
<div>
<table id="customers">
  <tr>
    <th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr>
  <tr>
    <td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr>
  <tr>
    <td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr>
  <tr>
    <td>Ernst Handel</td>
    <td>Roland Mendel</td>
    <td>Austria</td>
  </tr>
  <tr>
    <td>Island Trading</td>
    <td>Helen Bennett</td>
    <td>UK</td>
  </tr>
  <tr>
    <td>Laughing Bacchus Winecellars</td>
    <td>Yoshi Tannamuri</td>
    <td>Canada</td>
  </tr>
  <tr>
    <td>Magazzini Alimentari Riuniti</td>
    <td>Giovanni Rovelli</td>
    <td>Italy</td>
  </tr>
</table>
</div>';

$dom = new DOMExtract();
$dom->setSource($html);

echo '
<table cellspacing="0" cellpadding="3" border="0" width="100%">',
    //match and return only tables inner content with id=customers
    $dom->getInnerHTML('table', 'id=customers'), 
    //match all tables inner content
    //$dom->getInnerHTML('table'), 
'</table>';

https://3v4l.org/OkbQW

<table cellspacing="0" cellpadding="3" border="0" width="100%"><tr><th>Company</th>
    <th>Contact</th>
    <th>Country</th>
  </tr><tr><td>Alfreds Futterkiste</td>
    <td>Maria Anders</td>
    <td>Germany</td>
  </tr><tr><td>Centro comercial Moctezuma</td>
    <td>Francisco Chang</td>
    <td>Mexico</td>
  </tr><tr><td>Ernst Handel</td>
    <td>Roland Mendel</td>
    <td>Austria</td>
  </tr><tr><td>Island Trading</td>
    <td>Helen Bennett</td>
    <td>UK</td>
  </tr><tr><td>Laughing Bacchus Winecellars</td>
    <td>Yoshi Tannamuri</td>
    <td>Canada</td>
  </tr><tr><td>Magazzini Alimentari Riuniti</td>
    <td>Giovanni Rovelli</td>
    <td>Italy</td>
  </tr></table>

答案 1 :(得分:-1)

试试这个 要在标记之间提取数据,请尝试此代码 这里$source将是您完整的HTML代码。 $match将是标记之间提取的数据。

代码:

preg_match("'<table>(.*?)</table>'si", $source, $match); if($match) echo "result=".$match[1];

参考:Preg match text in php between html tags