Question

我目前正在从不提供json的服务进行webscraping信息。目标是从下面获取代码（小部分）并只抓取当前导入的项目。然后在php中生成一个json数组。

所以例如我想要从第一个条目

House M.D.（第4季）（第1至5集）
0 S_419212
剧院
DVD
6月2日下午6:05 正在进行中，还剩14分钟

下一个条目不是状态=＆＃39;有效＆＃39;所以跳过它。

示例代码

<tr class="import_row" handle="3f0761be271334a-L1_257" selection_handle="0-S_419212" state="active" utcstart="1464912324">
    <td valign="top" class="start_time" nowrap="">Jun. 02,  6:05 pm</td> 
    <td valign="top" class="title">House M.D. (Season 4) (Episodes 1 - 5)</td>
    <td valign="top" class="reader">Theater</td>
    <td valign="top" class="type">DVD</td>
    <td valign="top" class="status">In progress, 14 minutes left</td>
    <td valign="top" class="edit"></td>
</tr>


<tr class="import_row" handle="3f0761be271334a-L1_255" selection_handle="0-S_4c6be1" state="completed" utcstart="1464673067">
    <td valign="top" class="start_time" nowrap="">May. 30, 11:37 pm</td> 
    <td valign="top" class="title"><a href="javascript:getDetails('0-S_4c6be1');">National Treasure 2: Book of Secrets (Feature)</a></td>
    <td valign="top" class="reader">Theater</td>
    <td valign="top" class="type">DVD</td>
    <td valign="top" class="status">Completed in 26 minutes</td>
    <td valign="top" class="edit"></td>
</tr>

这也是我用来将信息导入PHP的代码

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://10.1.1.150/home/index.html');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$cookie_file = "cookie.txt";
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$input_lines = curl_exec($ch);
curl_close($ch);

PHP Webscraping到JSON

0 个答案: