无法通过查询字符串下载PISA数据

时间:2013-02-01 19:21:20

标签: curl bioinformatics

我正在尝试从PDBePISA下载文件。他们建议使用模板网址:

http://www.ebi.ac.uk/pdbe/pisa/cgi-bin/interfaces.pisa?pdbcodelist

其中pdbcodelist必须填写用户的偏好。例如:

http://www.ebi.ac.uk/pdbe/pisa/cgi-bin/interfaces.pisa?4FGF

到目前为止,我没有成功下载该文件。我尝试使用-L重定向,但我没有运气。

如果您在Web浏览器中键入示例URL,则会出现下载提示,允许您下载该文件。我试图用cURL完成同样的任务。 URL确实以某种方式引导到可以下载的文件。但是,到目前为止,如何使用cURL来解决这个问题。

3 个答案:

答案 0 :(得分:1)

似乎问题只是你正在利用PDB代码。在PDBePISA download page上,所有示例代码均为小写。我猜测驱动下载的脚本可能无法处理大写。

当我在浏览器中输入...pisa?4FGF URL时,我得到“未找到条目”XML,但使用小写修复了问题。

在我的机器上,以下内容完全符合预期:

$ curl -O http://www.ebi.ac.uk/msd-srv/pisa/cgi-bin/interfaces.pisa?4fgf

答案 1 :(得分:0)

当你卷曲

http://www.ebi.ac.uk/pdbe/pisa/cgi-bin/interfaces.pisa?4FGF

你得到这个标题

HTTP/1.1 302 Found
Server: Apache
Content-Type: text/html; charset=iso-8859-1
Date: Fri, 01 Feb 2013 19:29:11 GMT
Location: http://www.ebi.ac.uk/msd-srv/pisa/cgi-bin/interfaces.pisa?4FGF
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Length: 246

当您按照指定的位置操作时,您将在包含此内容的页面中结束

<pisa_interfaces>
  <status>Ok</status>
  <pdb_entry>
    <pdb_code>4FGF</pdb_code>
    <status>Entry not found</status>
  </pdb_entry>
</pisa_interfaces>

答案 2 :(得分:-1)

你可以使用这个类,这个类的原文在 php.net 我只对它做了一点改动

 <?php 
 function HeaderProc($response,$Run="",$String=1/*[Is 1 IF Use for String Mode ]*/){
          print_r($response);
          if($String==1){
             $response=explode("\r\n",$response);  
          }
          $PartHeader=0;
          $out[$PartHeader]=array();
          while(list($key,$val)=each($response)){
              $name='';
              $value='';
              $flag=false;
              for($i=0;$i<strlen($val);$i++){
                  if($val[$i]==":"){
                      $flag=true;
                      for($j=$i+1;$j<strlen($val);$j++){
                        if($val[$i]=="\r" and $val[$i+1]=="\n"){    
                            break;
                        }
                        $value.=$val[$j];
                      }
                      break;
                  }
                  $name.=$val[$i]; 
              }
              if($flag){
                if($name=='' and $value==''){
                    $PartHeader++;  
                }else{
                  if(isset($out[$PartHeader][$name])){
                    if(is_array($out[$PartHeader][$name])){   
                        $out[$PartHeader][$name][]=$value;
                    }else{
                        $T=$out[$PartHeader][$name];
                        $out[$PartHeader][$name]=array();
                        $out[$PartHeader][$name][0]=$T;  
                        $out[$PartHeader][$name][1]=$value;  
                    }
                  }else{
                    $out[$PartHeader][$name]=$value;
                  }
                }
              }else{
                if($name==''){
                    $PartHeader++;  
                }else{
                    if(isset($out[$PartHeader][$name])){ 
                      if(is_array($out[$PartHeader][$name])){   
                        $out[$PartHeader][$name][]=$value;
                      }else{
                        $T=$out[$PartHeader][$name];
                        $out[$PartHeader][$name]=array();
                        $out[$PartHeader][$name][0]=$T;  
                        $out[$PartHeader][$name][1]=$name;  
                      }
                    }else{
                        $out[$PartHeader][$name]=$name; 
                    }
                } 
              }
              if($Run!=""){
                $Run($name,$value);  
              }
          }
          return $out;
}
 class cURL { 
    var $headers; 
    var $user_agent; 
    var $compression; 
    var $cookie_file; 
    var $proxy; 
    var $Cookie; 
    function CookieAnalysis($Cookie){//convert str cookie to array cookie 
       //echo $Cookie;
       $this->Cookie=array();
       preg_replace_callback("~(.*?)=(.*?);~si",function($m){$this->Cookie[trim($m[1])]=trim($m[2]);},' '.$Cookie.'; ');
       return $this->Cookie;
    }
    function cURL($cookies=false,$cookie='cookies.txt',$compression='gzip',$proxy='') {
         $this->headers[] = 'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
         $this->headers[] = 'Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3'; 
         $this->headers[] = 'Accept-Encoding:gzip,deflate,sdch';
         $this->headers[] = 'Accept-Language:en-US,en;q=0.8';
         $this->headers[] = 'Cache-Control:max-age=0';
         $this->headers[] = 'Connection:keep-alive';
         $this->user_agent = 'User-Agent:Mozilla/5.0 (SepidarSoft [Organic Search Engine Crawler] Linux Edition) AppleWebKit/536.5 (KHTML, like Gecko) SepidarBrowser/1.0.100.52 Safari/536.5';
         $this->compression=$compression; 
         $this->proxy=$proxy; 
         $this->cookies=$cookies; 
         if ($this->cookies == TRUE) $this->cookie($cookie); 
    } 
    function cookie($cookie_file) { 
         if (file_exists($cookie_file)) { 
            $this->cookie_file=$cookie_file; 
         } else { 
            fopen($cookie_file,'w') or $this->error('The cookie file could not be opened. Make sure this directory has the correct permissions');
            $this->cookie_file=$cookie_file; 
            @fclose($this->cookie_file); 
         } 
    }
    function GET($url) { 
         $process = curl_init($url); 
         curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
         curl_setopt($process, CURLOPT_HEADER, 1); 
         curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
         if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file);
         if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file);
         curl_setopt($process,CURLOPT_ENCODING , $this->compression); 
         curl_setopt($process, CURLOPT_TIMEOUT, 30); 
         if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
         curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
         curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
         $response = curl_exec($process);
         $header_size = curl_getinfo($process,CURLINFO_HEADER_SIZE);
         $result['Header'] = HeaderProc(substr($response, 0, $header_size),'',1);
         foreach($result['Header'] as $HeaderK=>$HeaderP){
           foreach($HeaderP['Set-Cookie'] as $key=>$val){
             $result['Header'][$HeaderK]['Set-Cookie'][$key]=$this->CookieAnalysis($val);
           }
         }
         $result['Body'] = substr( $response, $header_size );
         $result['HTTP_State'] = curl_getinfo($process,CURLINFO_HTTP_CODE);
         $result['URL'] = curl_getinfo($process,CURLINFO_EFFECTIVE_URL); 
         curl_close($process); 
         return $result; 
    }
    function POST($url,$data) { 
         $process = curl_init($url); 
         curl_setopt($process, CURLOPT_HTTPHEADER, $this->headers); 
         curl_setopt($process, CURLOPT_HEADER, 1); 
         curl_setopt($process, CURLOPT_USERAGENT, $this->user_agent); 
         if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEFILE, $this->cookie_file);
         if ($this->cookies == TRUE) curl_setopt($process, CURLOPT_COOKIEJAR, $this->cookie_file);
         curl_setopt($process, CURLOPT_ENCODING , $this->compression); 
         curl_setopt($process, CURLOPT_TIMEOUT, 30); 
         if ($this->proxy) curl_setopt($process, CURLOPT_PROXY, $this->proxy); 
         curl_setopt($process, CURLOPT_POSTFIELDS, $data); 
         curl_setopt($process, CURLOPT_RETURNTRANSFER, 1); 
         curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1); 
         curl_setopt($process, CURLOPT_POST, 1);
         $response = curl_exec($process); 
         $header_size = curl_getinfo($process,CURLINFO_HEADER_SIZE);
         $result['Header'] = HeaderProc(substr($response, 0, $header_size),'',1);
         foreach($result['Header'] as $HeaderK=>$HeaderP){
           foreach($HeaderP['Set-Cookie'] as $key=>$val){
             $result['Header'][$HeaderK]['Set-Cookie'][$key]=$this->CookieAnalysis($val);
           }
         }
         $result['Body'] = substr( $response, $header_size );
         $result['HTTP_State'] = curl_getinfo($process,CURLINFO_HTTP_CODE);
         $result['URL'] = curl_getinfo($process,CURLINFO_EFFECTIVE_URL);
         curl_close($process); 
         return $result; 
    }
    function error($error) { 
         echo "<center><div style='width:500px;border: 3px solid #FFEEFF; padding: 3px; background-color: #FFDDFF;font-family: verdana; font-size: 10px'><b>cURL Error</b><br>$error</div></center>";
         die; 
    } 
 } 

?>

此代码使您可以轻松使用Curl;

初始化为Curl:tou可以看到脚本中的参数但是如果你想要可以使用这样的默认模式:

 $cc = new cURL(); 

for post post request你可以使用打击代码

 $response=$cc->post('http://www.sepidarsoft.com','foo=bar');
 print_r($response);

或者获取请求您可以使用打击代码

 $response=$cc->get('http://www.sepidarsoft.com');
 print_r($response);

此代码支持重定向