Question

我创建了一个简单的脚本，可以找到网站的所有传出<a>标记，并显示它们。

要首先执行此操作，我会抓取站点地图，将这些网址放入数组中，然后循环浏览每个网址，分别搜索<a>个标记，然后在每个找到的标记上运行strpos()如果它有任何我想忽略的URL。

该脚本大约需要5分钟（刮擦500页）才能完成（在本地运行），我想知道是否有更快的方法来处理针/ haystack搜索排除的params。目前我正在使用

//SEES IF URL CONTAINS EXCLUDED PARAM
function find_excluded_url ($match_url) {
    return strpos($match_url, "mydomain.co.uk") ||
        strpos($match_url, "tumblr.com") ||
        strpos($match_url, "nofollow") ||
        strpos($match_url, "/archive") || 
        strpos($match_url, "page/2");
}

然后使用

显示结果

if ( find_excluded_url($element) == false ) {
   echo "<a href='$element->href'>" . $element->href . "</a>";
}

是否有更高效的方法来实现这一目标？

很抱歉，如果这是一个非常明显的问题，这是我用PHP构建的第一个真实的东西

Answer 1

只需注释，strpos如果元素位于字符串的开头则返回0，如果元素不在字符串中则返回false。

对于PHP 0和false是相同的，这意味着您的脚本不会识别以关键字开头的链接。

所以我建议你将脚本更改为：

function find_excluded_url ($match_url) {
    return strpos($match_url, "mydomain.co.uk") !== false ||
         strpos($match_url, "tumblr.com") !== false ||
         strpos($match_url, "nofollow") !== false ||
         strpos($match_url, "/archive") !== false || 
         strpos($match_url, "page/2") !== false;
}

Answer 2

如果要检查1个字符串是否在另一个字符串中，则应使用以下2之一： http://php.net/manual/en/function.stristr.php
http://php.net/manual/en/function.strstr.php

strpos上的警告：“此函数可能返回布尔值FALSE，但也可能返回一个非布尔值，其值为FALSE。请阅读有关布尔值的部分以获取更多信息。使用===运算符测试返回值这个功能。“

/**
 * Loops through the array to see if one
 * of the values is inside the $needle
 *
 * @param  string $needle
 * @param  array  $haystack
 * @return bool
 */
function strstr_array($needle, array $haystack)
{
  foreach($haystack as $search) {
    if(strstr($needle, $search)) {
      return true;
    }
  }
  return false;
}

$haystack = array('my-domain.com', 'sub.my-domain.com');
var_dump(strstr_array('test my-domain.com or something', $haystack));

Answer 3

function find_excluded_url($match_url, $excludeList)
{
    foreach($excludeList as $excluded)
    {
        if(stristr($match_url, $excluded) !== FALSE)
        return TRUE;
        else return FALSE;
    }
}

$excludes = array(
                      'mydomain.co.uk'
                    , 'tumblr.com'
                    , 'nofollow'
                    , '/archive'
                    , 'page/2'
                 );

$example1 = 'http://example.mydomain.co.uk/dir/';
$example2 = 'https://not.in/excludes';
var_dump(find_excluded_url($example1, $excludes));
var_dump(find_excluded_url($example2, $excludes));

// output from browser:  bool(true) bool(false)

Answer 4

试试这个

if (preg_match('/word/i', $str))

编写多个strpos调用的更好方法

4 个答案: