检查字符串是否包含url并获取url php的内容

时间:2014-05-16 06:33:45

标签: php url hyperlink file-get-contents

Here is a preety presentable example of what i want to do dynamically

假设某人在textarea中输入了这样的字符串

“最好的搜索引擎是www.google.com。”

或者

“最好的搜索引擎是https://www.google.co.in/?gfe_rd=cr&ei=FLB1U4HHG6aJ8Qfc1YHIBA。”

然后我想突出显示 stackoverflow 的链接。 我还希望file_get_contents获得一个图片,一个简短的描述和页面标题。

我很可能想检查字符串是否包含网址 - >两次。

  • 使用jQuery textarea的密钥,然后使用 get_file_contents
  • 当字符串被php收到时。

我怎么能这样做?

更新

function parseHyperlinks($text) {
// The Regular Expression filter
$reg_exUrl1 = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$reg_exUrl2 = "/[\w\d\.]+\.(com|org|ca|net|uk)/";
// The Text you want to filter for urls

// Check if there is a url in the text
if(preg_match($reg_exUrl1, $text, $url)) {

       // make the urls hyper links
       return preg_replace($reg_exUrl1, "<a class=\"content-link link\" href=\"{$url[0]}\">{$url[0]}</a> ", $text);

} else if(preg_match($reg_exUrl2, $text, $url)){

       return preg_replace($reg_exUrl2, "<a class=\"content-link link\" href=\"{$url[0]}\">{$url[0]}</a> ", $text);

}else{

       // if no urls in the text just return the text
       return $text;

}
}
  • 仅适用于$str='www.google.com is the best'$str='http://www.google.com is best'但不适用于$str='http://stackoverflow.com/ and www.google.com is the best'

1 个答案:

答案 0 :(得分:1)

首先你创建了html,然后你需要一个AJAX来请求服务器。请考虑以下示例代码:

HTML / jQuery的:

<!-- instead of textarea, you could use an editable div for styling highlights, or if you want, just use a plugin -->
<div id="textarea" 
    style="
    font-family: monospace;
    white-space: pre;
    width: 300px;
    height: 200px;
    border: 1px solid #ccc;
    padding: 5px;">For more tech stuff, check out http://www.tomshardware.com/ for news and updates.</div><br/>
<button type="button" id="scrape_site">Scrape</button><br/><br/>
<!-- i just used a button to hook up the scraping, you can just bind it on a keyup/keydown. -->

<div id="site_output" style="width: 500px;">
    <label>Site: <p id="site" style="background-color: gray;"></p></label>
    <label>Title: <p id="title" style="background-color: gray;"></p></label>
    <label>Description: <p id="description" style="background-color: gray;"></p></label>
    <label>Image: <div id="site_image"></div></label>
</div>

<script type="text/javascript" src="jquery.min.js"></script>
<script type="text/javascript">
$(document).ready(function(){

    $('#scrape_site').on('click', function(){
        var value = $.trim($('#textarea').text());
        $('#site, #title, #description').text('');
        $('#site_image').empty();
        $.ajax({
            url: 'index.php', // or you php that will process the text
            type: 'POST',
            data: {scrape: true, text: value},
            dataType: 'JSON',
            success: function(response) {
                $('#site').text(response.url);
                $('#title').text(response.title);
                $('#description').text(response.description);
                $('#site_image').html('<img src="'+response.src+'" id="site_image" />');
            }
        });
    });

    // you can use an editable div so that it can be styled,
    // theres to much code already in the answer, you can just get a highlighter plugin to ease your pain
    $('#textarea').each(function(){
        this.contentEditable = true;
    });

});
</script>

在您将要处理的php上,在这种情况下(index.php):

if(isset($_POST['scrape'])) {

    $text = $_POST['text'];

    // EXTRACT URL
    $reg_exurl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
    preg_match_all($reg_exurl, $text, $matches);
    $usedPatterns = array();
    $url = '';
    foreach($matches[0] as $pattern){
        if(!array_key_exists($pattern, $usedPatterns)){
            $usedPatterns[$pattern] = true;
            $url = $pattern;
        }
    }

    // EXTRACT VALUES (scraping of title and descriptions)

    $doc = new DOMDocument();
    $doc->loadHTMLFile($url);
    $xpath = new DOMXPath($doc);
    $title = $xpath->query('//title')->item(0)->nodeValue;
    $description = $xpath->query('/html/head/meta[@name="description"]/@content');
    if ($description->length == 0) {
        $description = "No description meta tag :(";
        // Found one or more descriptions, loop over them
    } else {
        foreach ($description as $info) {
            $description = $info->value . PHP_EOL;
        }
    }

    $data['description'] = $description;
    $data['title'] = $title;
    $data['url'] = $url;

    // SCRAPING OF IMAGE (the weirdest part)
    $image_found = false;
    $data['src'] = '';
    $images = array();

    // get all possible images and this is a little BIT TOUGH
    // check for og:image (facebook), some sites have this, so first lets take a look on this meta
    $facebook_ogimage = $xpath->query("/html/head/meta[@property='og:image']/@content");
    foreach($facebook_ogimage as $ogimage) {
        $data['src'] = $ogimage->nodeValue;
        $image_found = true;
    }

    // desperation search (get images)
    if(!$image_found) {
        $image_list = $xpath->query("//img[@src]");
        for($i=0;$i<$image_list->length; $i++){
            if(strpos($image_list->item($i)->getAttribute("src"), 'ad') === false) {
                $images[] = $image_list->item($i)->getAttribute("src");
            }
        }

        if(count($images) > 0) {
            // if at least one, get it
            $data['src'] = $images[0];
        }
    }

    echo json_encode($data);
    exit;

}
?>
  

注意:虽然这并不完美,但您可以将其作为参考,只需对其进行改进,并使其更具动态性。