如果文件大于给定大小,则阻止从远程源加载

时间:2016-04-21 06:30:55

标签: php domdocument filesize

假设我希望从远程服务器加载最多10MB的XML文件。

这样的东西
$xml_file = "http://example.com/largeXML.xml";// size= 500MB

//PRACTICAL EXAMPLE: $xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";// size= 683MB

 /*GOAL: Do anything that can be done to hinder this large file from being loaded by the DOMDocument without having to load the File n check*/

$dom =  new DOMDocument();

$dom->load($xml_file /*LOAD only IF the file_size is <= 10MB....else...echo 'File is too large'*/);

这个可能如何实现?....任何想法或替代方案?或者是实现这一目标的最佳方法将受到高度赞赏。

我检查了PHP: Remote file size without downloading file,但是当我尝试使用

之类的东西时
var_dump(
    curl_get_file_size(
        "http://www.dailymotion.com/rss/user/dialhainaut/"
    )
);

我得到string 'unknown' (length=7)

当我按照下面的建议尝试get_headers时,标题中缺少Content-Length,因此这也无法可靠地工作。

请告知如何确定length,如果超过DOMDocument

,请避免将其发送至10MB

3 个答案:

答案 0 :(得分:2)

好的,终于工作了。标题解决方案显然无法广泛运作。在此解决方案中,我们打开文件句柄并逐行读取XML,直到达到$ max_B的阈值。如果文件太大,我们仍然有读取它的开销,直到10MB标记,但它正如预期的那样工作。如果文件小于$ max_B,则继续......

$xml_file = "http://www.dailymotion.com/rss/user/dialhainaut/";
//$xml_file = "http://www.cs.washington.edu/research/xmldatasets/data/pir/psd7003.xml";

$fh = fopen($xml_file, "r");  

if($fh){
    $file_string = '';
    $total_B = 0;
    $max_B = 10485760;
    //run through lines of the file, concatenating them into a string
    while (!feof($fh)){
        if($line = fgets($fh)){
            $total_B += strlen($line);
            if($total_B < $max_B){
                $file_string .= $line;
            } else {
                break;
            }
        }
    } 

    if($total_B < $max_B){
        echo 'File ok. Total size = '.$total_B.' bytes. Proceeding...';
        //proceed
        $dom = new DOMDocument();
        $dom->loadXML($file_string); //NOTE the method change because we're loading from a string   

    } else {
        //reject
        echo 'File too big! Max size = '.$max_B.' bytes.';  
    }

    fclose($fh);

} else {
    echo '404 file not found!';
}

答案 1 :(得分:1)

10MB等于10485760 B.如果未指定content-length,则将使用自php5以来可用的curl。我从SO的某个地方获得了这个来源,但是不记得了。:

      String filepath="directory containing file ";
      ContextWrapper contextWrapper = new ContextWrapper(getApplicationContext());
      File directory = contextWrapper.getDir(filepath, Context.MODE_PRIVATE);
      File myInternalFile = new File(directory , filename);

Check demo here

答案 2 :(得分:-1)

编辑:新答案有点麻烦:
您无法检查Dom Elements Length,但是,您可以发出标头请求并从URL获取文件大小:

<?php

function i_hope_this_works( $XmlUrl ) {
    //lets assume we fk up so we set size to -1  
    $size = -1;

      $request = curl_init( $XmlUrl );

      // Go for a head request, so the body of a 1 gb file will take the same as 1 kb
      curl_setopt( $request, CURLOPT_NOBODY, true );
      curl_setopt( $request, CURLOPT_HEADER, true );
      curl_setopt( $request, CURLOPT_RETURNTRANSFER, true );
      curl_setopt( $request, CURLOPT_FOLLOWLOCATION, true );
      curl_setopt( $request, CURLOPT_USERAGENT, get_user_agent_string() );

      $requesteddata = curl_exec( $request );
      curl_close( $request );

      if( $requesteddata ) {
        $content_length = "unknown";
        $status = "unknown";

        if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $requesteddata, $matches ) ) {
          $status = (int)$matches[1];
        }

        if( preg_match( "/Content-Length: (\d+)/", $requesteddata, $matches ) ) {
          $content_length = (int)$matches[1];
        }

        // you can google status qoutes 200 is Ok for example
        if( $status == 200 || ($status > 300 && $status <= 308) ) {
          $result = $content_length;
        }
      }

      return $result;
    }
    ?>

您现在应该只能使用

获取所需的每个文件大小
$file_size = i_hope_this_works('yourURLasString')