抓取html时,另一个数据日期值为null

时间:2017-10-06 04:47:54

标签: php html mysql

这是我的问题,我尝试抓住日期并从2表中的特殊字符更改为日期格式。它成功了。 但是1表日期返回00:00:00,另一个表成功。这里的代码,表格和输出如下。

表1

<TABLE class="tab1" border="1" cellpadding="0" cellspacing="0" 
summary="">
<TR>
<TH align=left colspan=2 bgcolor=#0066CC><H1> &nbsp;Start RIP Job</H1>
</TH>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;Send Date:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;1/9/2017 1:15 PM&nbsp; &nbsp;
</TD>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;RIP Start Date and Time:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;13:21:22 09/01/2017&nbsp; &nbsp;
</TD>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;RIP End Date and Time:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;13:21:33 09/01/2017&nbsp; &nbsp;
</TD>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;RIP Duration:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;11 seconds&nbsp; &nbsp;
</TD>
</TR>
<TR>
<TH align=left colspan=2 bgcolor=#0066CC><H1> &nbsp;End RIP Job</H1>
</TH>
</TR>
</TABLE>

表2

<TABLE class="tab1" border="1" cellpadding="0" cellspacing="0" 
summary="">
<TR>
<TH align=left colspan=2 bgcolor=#0066CC><H1> &nbsp;Start RIP Job</H1>
</TH>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;Printer:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;RunJiang Flora 3204P&nbsp; 
&nbsp;
</TD>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;Send Date:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;9/29/2017 10:09 PM&nbsp; &nbsp;
</TD>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;RIP Start Date and Time:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;22:09:49 29/09/2017&nbsp; &nbsp;
</TD>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;RIP End Date and Time:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;22:10:13 29/09/2017&nbsp; &nbsp;
</TD>
</TR>
<TR>
<TH align=left> &nbsp; &nbsp;RIP Duration:
</TH>
<TD class="td1" align=left> &nbsp; &nbsp;24 seconds&nbsp; &nbsp;
</TD>
</TR>
<TR>
<TH align=left colspan=2 bgcolor=#0066CC><H1> &nbsp;End RIP Job</H1>
</TH>
</TR>
</TABLE>

代码:

$source=file_get_contents("C://xampp/htdocs/Champion/machine-logs/LogPrinting04/nulldate.HTML");
$dom = new DOMDocument();

$dom->loadHTML($source);
// print_r($dom);
$xp = new DOMXPath($dom);

    $textList = $xp->query("//table[//th[contains(text(),'')]]");
    foreach ( $textList as $text )  {
     $enddate = $xp->evaluate(
                "string(descendant::tr[th[contains(text(),'RIP End Date and Time') or contains(text(),'Output End Date And Time')]]/td/text())",
                $text);
        $date = preg_replace("/[^0-9a-zA-Z \/:\-]/", "", $enddate);
        $xtime = strtotime($date);
        $tes = date("Y-m-d H:i:s",$xtime);
        echo "enddate=".$tes.PHP_EOL;
        }

输出:

表1:2017-09-01 13:21:33

表2:0000-00-00 00:00:00

1 个答案:

答案 0 :(得分:0)

'29'不是有效月份值。

查看返回值的那个。请注意,它将在9月1日返回,而不是1月9日。

  13:21:33 09/01/2017    ->  2017-09-01 13:21:33
           mm dd yyyy        yyyy mm dd 

然后看一下返回零的那个。

  22:10:13 29/09/2017    
           mm dd yyyy

29不是月份的有效值。 (如果我们期望这个输入是9月29日的代表,那么我们也可能期望第一个代表1月1日。)