在某些情况下,这种方法很好,在下面的其他情况下,不是。
$xml_url = 'http://campusdining.compass-usa.com/Hofstra/Pages/SignageXML.aspx?location=Student%20Center%20Cafe';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.3a5pre) Gecko/20100526 Firefox/3.7a5pre");
$data = curl_exec($ch);
$ce = curl_error($ch);
curl_close($ch);
// this is how I was doing it prior to today and it worked before
// preg_match_all("/<MealPeriod name=\"(.+?)\">([\w\W\r\n]*?)<\/MealPeriod>/i", $data, $output_array);
// this way doesnt show all the meal periods,
// but I need to know whats in between the MealPeriod tags
// preg_match_all('/<MealPeriod name="(.*?)">(.*?)<\/MealPeriod>/i', $data, $output_array);
// shows all the meal period names,
// but I need the above to work to store whats in between the MealPeriod tags in the $output_array[2]
preg_match_all('/<MealPeriod name="(.*?)">/i', $data, $output_array);
echo '<pre> '.print_r($output_array[1],1).'</pre>';
我在一些正则表达式的实时网站上尝试了这个,其中一个返回了我需要的内容,而第二个没有... http://www.phpliveregex.com/ - 确实有效 https://regex101.com/ - 无效
$output_array[1]
Array
(
[0] => Breakfast
[1] => Every Day
[2] => Outtakes
[3] => Salad Bar
)
但它也应该在$output_array[2]
非常感谢任何帮助
答案 0 :(得分:0)
以下代码有效,我所做的就是更改正则表达式并更改print
。
屏幕上的输出看起来很奇怪,因为第二个(。*?)捕获<MealPeriod>
和</MealPeriod>
之间的所有内容也会捕获所有xml标记。如果你查看源代码,你可以清楚地看到这一点。
我建议您使用XML Parser来处理文档。在使用解析器将它们转换为对象之前,我当然使用正则表达式来提取XML文档的一部分,但是解析器比使用正则表达式更好地处理XML(突飞猛进)。
所有内容都已捕获,但未使用<pre>
标记打印到屏幕上。但是,如果您查看源代码,那么的所有内容都是。
<?php
$xml_url = 'http://campusdining.compass-usa.com/Hofstra/Pages/SignageXML.aspx?location=Student%20Center%20Cafe';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xml_url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.3a5pre) Gecko/20100526 Firefox/3.7a5pre");
$data = curl_exec($ch);
$ce = curl_error($ch);
curl_close($ch);
// this is how I was doing it prior to today and it worked before
// preg_match_all("/<MealPeriod name=\"(.+?)\">([\w\W\r\n]*?)<\/MealPeriod>/i", $data, $output_array);
// this way doesnt show all the meal periods,
// but I need to know whats in between the MealPeriod tags
// preg_match_all('/<MealPeriod name="(.*?)">(.*?)<\/MealPeriod>/i', $data, $output_array);
// shows all the meal period names,
// but I need the above to work to store whats in between the MealPeriod tags in the $output_array[2]
preg_match_all('/<MealPeriod name="(.*?)">(.*?)<\/MealPeriod>/i', $data, $output_array);
echo '<pre> '.print_r($output_array,1).'</pre>';
?>
答案 1 :(得分:0)
由于以下堆栈溢出帖子php regex or | operator
,我找到了答案我需要将正则表达式更改为以下内容,我终于能够在正确的数组中返回所有用餐时段和内容。
'/<MealPeriod name="(.*?)">(.*?)<\/?MealPeriod>/i'
<\/?Meal