使用正则表达式

时间:2018-02-12 08:18:39

标签: php jquery json regex

我正在编写一个爬虫来废弃在线商店的价格信息,我使用的是Goutte PHP并且它不支持javascript交互,它只是抓取静态HTML DOM,所以问题出在我的爬虫价格,库存信息的回复中隐藏在脚本标记内的HTML DOM中,该标记进一步用JSON封装,这是我得到的数据

<script>
HZ.productVariation.Manager.setSpaceId('33503761');
HZ.data.Variations.put('33503761', {"availVar": [{"id": "c", "label": "Color", "options": [{"name": "Chrome", "avail": 1, "stock": 1, "price": "$174.51", "quantity": "52", "imageId": "3eb1230d05775d3c"}, {"name": "SuperSteel", "avail": 1, "stock": 0, "price": "$341.40", "quantity": "0", "imageId": "d0a126f505775d3e"}]}], "curVar": {"c": "SuperSteel"}, "exactMatch": true});
HZ.productVariation.Manager.setSelector(HZ.productVariation.ListSelector);
HZ.productVariation.Manager.setRenderer(HZ.productVariation.ViewSpaceRenderer);
HZ.productVariation.Manager.setHistoryManager(new HZ.productVariation.BrowserHistoryManager("replace"));
$('.variationSelectors').append(HZ.productVariation.Manager.drawSelectors('33503761'));

HZ.productVariation.Manager.initUI();
</script>

我想创建一个正则表达式,我给它键,它将返回值,让我们说我想获得“股票”的价值所以当我在正则表达式中插入股票密钥时,它将返回与密钥库存相关的所有值作为PHP数组,当我插入具有对象数组作为值的键时,应该返回嵌套数组。

这是编码尝试:

$re = '/{"availVar":(.*)}/';
preg_match_all($re, $string, $output_array, PREG_SET_ORDER, 0);
$json = json_decode($output_array[0][0], true);

目前我正在遍历这个已解码的json数组并获取库存状态,我想要做的只是编写一个接收密钥的函数,应用正则表达式并返回此值作为结果。

无论如何都要创建这种类型的正则表达式?请指教,谢谢

2 个答案:

答案 0 :(得分:0)

解析Json并使用json解码 我假设33503761是某种不应该出现在Json中的id。

$pos1 =strpos($html, ".put(")+17; // find "put(" and some id? And skip it
$json = substr($html, $pos1, strpos($html, ");", $pos1)-$pos1); //parse json
$arr =json_decode($json,true);
Var_dump($arr);

https://3v4l.org/jdnC0

要查找库存,您可以使用array_walk_recursive。

function test_print($item, $key, $find)
{
    if($key == $find)
        echo "$key holds $item\n";
}

array_walk_recursive($arr, 'test_print', "stock");

https://3v4l.org/5luKk

答案 1 :(得分:0)

代码:(Demo

function extractColumnData($html,$key){
    if(!preg_match("~\.put\('\d+',\s+(.+?)\);~",$html,$out)){
        return "alert yourself of the preg_match failure";
    }
    if(($array=@json_decode($out[1],true))===null && json_last_error()!==JSON_ERROR_NONE){
       return "alert yourself of the json decode failure";
    }
    return array_column(current(array_column(current($array),'options')),$key,'name');  // this assumes the static structure of your json/array data
}

$html=<<<HTML
<script>
HZ.productVariation.Manager.setSpaceId('33503761');
HZ.data.Variations.put('33503761', {"availVar": [{"id": "c", "label": "Color", "options": [{"name": "Chrome", "avail": 1, "stock": 1, "price": "$174.51", "quantity": "52", "imageId": "3eb1230d05775d3c"}, {"name": "SuperSteel", "avail": 1, "stock": 0, "price": "$341.40", "quantity": "0", "imageId": "d0a126f505775d3e"}]}], "curVar": {"c": "SuperSteel"}, "exactMatch": true});
HZ.productVariation.Manager.setSelector(HZ.productVariation.ListSelector);
HZ.productVariation.Manager.setRenderer(HZ.productVariation.ViewSpaceRenderer);
HZ.productVariation.Manager.setHistoryManager(new HZ.productVariation.BrowserHistoryManager("replace"));
$('.variationSelectors').append(HZ.productVariation.Manager.drawSelectors('33503761'));

HZ.productVariation.Manager.initUI();
</script>
HTML;

echo "avail => ";
var_export(extractColumnData($html,'avail'));
echo "\n----\n";
echo "stock => ";
var_export(extractColumnData($html,'stock'));
echo "\n----\n";
echo "price => ";
var_export(extractColumnData($html,'price'));
echo "\n----\n";
echo "quantity => ";
var_export(extractColumnData($html,'quantity'));
echo "\n----\n";
echo "imageId => ";
var_export(extractColumnData($html,'imageId'));

输出:

avail => array (
  'Chrome' => 1,
  'SuperSteel' => 1,
)
----
stock => array (
  'Chrome' => 1,
  'SuperSteel' => 0,
)
----
price => array (
  'Chrome' => '$174.51',
  'SuperSteel' => '$341.40',
)
----
quantity => array (
  'Chrome' => '52',
  'SuperSteel' => '0',
)
----
imageId => array (
  'Chrome' => '3eb1230d05775d3c',
  'SuperSteel' => 'd0a126f505775d3e',
)