Question

我制作了一个简单的应用程序来从allrecipes.com.我正在使用preg_match这样的网站上获取食谱信息，但是某些内容无效。

$geturl = file_get_contents("http://allrecipes.com/Recipe/Brown-Sugar-Smokies/Detail.aspx?src=rotd");
          preg_match('#<title>(.*) - Allrecipes.com</title>#', $geturl, $match);
          $name = $match[1];
          echo $name;

我只是试图获取页面的标题（减去- Allrecipes.com部分）并将其放入变量中，但所有出现的内容都是空白的。

Answer 1

如果您查看页面的来源，您会注意到<title>在实际文本周围包含一些填充，您需要对其进行补偿。

'#<title>\s*(.*) - Allrecipes.com\s*</title>#'

Answer 2

这种模式存在两个问题。首先，<title>之后有一个换行符号，.没有捕获（因为没有/s修饰符.字面上是任何符号但是EOL one＆＃39;）。其次，Allrecipes.com文本实际上没有</title>子字符串，后面有换行符。

考虑到\s涵盖正常空格和分隔一行的事实，你可以像这样改变你的正则表达式：

'#<title>\s*(.*?) - Allrecipes.com\s*</title>#s'

/s修饰符在这里实际上并不相关（因为这个食谱中的标题是单行的，所有＆＃34; \ n＆符号将由\s*子表达式覆盖。但是我仍然建议把它留在那里，以便多线游戏不会让你措手不及。

为了提高效率，我已将.*替换为.*?：因为您要查找的字符串非常短，所以在此处使用非贪婪量词是有意义的。

Answer 3

你应首先获得整个标题，然后使用PHP删除它，如下所示：

<?php

$raw_html=file_get_contents('http://www.allrecipes.com');
if (empty($raw_html)) {
    throw new \RuntimeException('Fetch empty');
}

$matches=array();
if (preg_match('/<title>(.*)<\/title>/s', $raw_html, $matches) === false) {
    throw new \RuntimeException('Regex error');
}

$title=trim($matches[1]);

// you should strip your title here
echo $title;

Preg_match的问题

3 个答案: