Question

请帮助我更有效地剥离以下内容。

a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"

我访问的网站有一些，我只需要两个时期之间的所有内容：

vFIsdfuIHq4gpAnc

我想使用我当前的格式和编码，它适用于正则表达式环境。请帮我调整以下preg match line：

preg_match_all("(./(.*?).html)", $sp, $content);

我非常感谢您对此提出的任何帮助，并提前感谢您！

这是我的完整代码

$dp = "http://www.cnn.com";

$sp = @file_get_contents($dp);
if ($sp === FALSE) {
    echo("<P>Error: unable to read the URL $dp.  Process aborted.</P>");
    exit();
}

preg_match_all("(./(.*?).html)", $sp, $content); 

foreach($content[1] as $surl) {
    $nctid = str_replace("mv/","",$surl);
    $nctid = str_replace("/","",$nctid);
   echo $nctid,'<br /><br /><br />';

以上是我一直在努力的事情

Answer 1

非常好，真的。只是您不想匹配.*?，您希望匹配多个不是句号的字符，因此您可以使用[^.]+代替。

$sp = 'a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"';
preg_match_all( '/\.([^.]+).html/', $sp, $content );

var_dump( $content[1] );

打印结果：

array(1) {
  [0]=>
  string(16) "vFIsdfuIHq4gpAnc"
}

以下是如何遍历所有链接的示例：

<?php
$url = 'http://www.cnn.com';

$dom = new DomDocument( );
@$dom->loadHTMLFile( $url );

$links = $dom->getElementsByTagName( 'a' );

foreach( $links as $link ) {
    $href = $link->attributes->getNamedItem( 'href' );
    if( $href !== null ) {
        if( preg_match( '~mv/.*?([^.]+).html~', $href->nodeValue, $matches ) ) {
            echo "Link-id found: " . $matches[1] . "\n";
        }
    }
}

Answer 2

您可以使用explode()：

$string = 'a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"';
if(stripos($string, '/mv/')){
    $dots = explode('.', $string);
    echo $dots[(count($dots)-2)];
}

Answer 3

如何使用explode？

$exploded = explode('.', $sp);
$content = $exploded[1]; // string: "vFIsdfuIHq4gpAnc"

Answer 4

更简单

$sp="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html";
$regex = '/\.(?P<value>.*)\./';
preg_match_all($regex, $sp, $content);
echo nl2br(print_r($content["value"], 1));

更好的条带化方法php正则表达式

4 个答案: